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DETERMINING KINASE SPECIFICITY 



This application claims priority from U.S. Application Ser. No. 
5 10/660,370 filed September 11, 2003, the contents of which are incorporated 
herein in their entireties. 

Government Funding 

The invention described herein was developed with support from the 
National Institutes of Health. The U.S. Government has certain rights in the 
10 invention. 

Field of the Invention 

The invention relates to methods, articles, software and kits for 
determining the spectrum of peptidyl sequences that are recognized and 
phosphorylated by a kinase, peptides that include kinase recognition sites and 
1 5 binding entities that specifically distinguish phosphorylated versus non- 
phosphorylated peptidyl sequences. 

Background of the Invention 
The activity of cells is regulated by external signals that stimulate or 
inhibit intracellular events. The process by which stimulatory or inhibitory 
20 signals are transmitted into and within a cell to elicit an intracellular response is 
referred to as signal transduction. Proper signal transduction is essential for 
proper cellular function. Defects in various components of signal transduction 
pathways, from cell surface receptors to activators of gene transcription, account 
for a vast number of diseases, including numerous forms of cancer, vascular 
25 diseases and neuronal diseases. 

Signal transduction is largely mediated by protein kinases. Protein 
kinases are enzymes that phosphorylate other proteins and/or themselves (auto- 
phosphorylation). A major rate-limiting problem in understanding signal 
transduction within cells is to determine which kinase phosphorylates which 
30 protein substrate at which sites within the protein substrate. 

Eukaryotic protein kinases are numerous and diverse; there are more than 
500 human genes than encode different protein kinases (Manning G et al. 2002. 

1 
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Science 298: 1912-1934). Eukaryotic protein kinases that are involved in signal 
transduction can be divided into three major groups based upon their substrate 
utilization. First, the protein-tyrosine specific kinases can phosphorylate 
substrates on tyrosine residues. Second, the protein-serine/threoiiine specific 
5 kinases can phosphorylate substrates at serine and/or threonine residues. Finally, 
the dual-specificity kinases can phosphorylate substrates at tyrosine, serine 
and/or threonine residues. 

In order to insure fidelity in intracellular signal transduction cascades it is 
essential that each protein kinase have exquisite specificity for its target 

10 substrate(s). In general, kinases appear to phosphorylate multiple different target 
sites on multiple proteins, thereby allowing branching of an initial signal 
delivered to a cell in multiple directions in order to coordinate a set of events that 
occur in parallel for a given cellular response (see, for example, Roach, P. J. 
(1991) J. Biol. Chem. 266:14139-14142). 

1 5 The substrate specificity of a protein kinase can be influenced by at least 

three general mechanisms that depend on the overall structure of the enzyme. 
First, specific domains in certain protein kinases can target the kinase to specific 
locations in the cell, thereby restricting the substrate availability of the kinase. 
Second, domains in the kinase, distinct from its catalytic domain, may provide 

20 high affinity association with either the substrate or an adapter molecule that 
presents the substrate to the kinase. Finally, kinase specificity is ultimately 
provided by the structure of the catalytic site of the protein kinase that drives it 
to select one peptide substrate sequence over another. 

Although the number of protein kinases that have been implicated in 

25 intracellular signaling is quite large, detailed information about the sequence 
specificity of these kinases is available for only a limited number of these 
kinases. Shortcomings in the available approaches for detailed characterization 
of kinase specificity are largely responsible for this scarcity of information. 
One systematic approach to characterization of kinase specificity involves 

30 collecting information on many specific substrates for a kinase and determining 
common features amongst the substrates sequences (Kreegipuu A et al. 1998. 
FEBS Lett 430:45-50). Such determination of the individual substrates is a 
laborious and largely empirical process, making this a slow and relatively 
inefficient way to derive comprehensive information on kinase specificity. 
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Serine/threonine kinases can be subdivided by peptide specificity into 
three broad classes: basophilic kinases that phosphorylate sites with clusters of 
positively charged amino acid residues, acidophilic kinases that phosphorylate 
sites with clusters of negatively charged amino acid residues and proline- 
5 directed kinases that phosphorylate sites in which Ser/Thr is followed 
immediately by a proline (i.e. proline is at the P+l position). 

In the early 1990s, Cantley and colleagues invented a method that 
attempts to accurately predict the spectrum of good peptide substrates for a 
kinase (see U.S. Patent No. 5,532,167; Songyang et al. (1994) Curr. Biol. 4:973- 
10 982). Predictions of substrate specificity made by this method are available at a 
website at scansite.mit.edu/. See also, Obenauer et al. (2003) Nucleic Acids Res. 
31:3635-3641; Yaffeetal. (2001) Nat. Biotechnol. 19:348-353. Other workers 
have tested the specificities of kinases using one or more known substrates. See, 
Himpel et al. (2000) J. Biol. Chem. 275:2431-2438, Velentza et al. (2001) J. 
15 Biol. Chem. 276:38956-38965; Dostmann et al. (1999) Pharmacol. Ther. 82:373- 
387; Tegge et al. (1998) Methods Mol Biol 87:99-106; Tegge et al. (1995) 
Biochemistry 34:10569-10577. 

Limitations typical of these previous approaches include a failure to 
validate the substrate specificities indicated by the methods employed, a 
20 propensity for seeking optimal substrate sequences rather than defining the 
universe of preferred substrates, and/or assumptions that a method provides 
general information when it may provide rather narrow information. Thus, there 
is a need for an alternative method to accurately characterize the universe of 
preferred substrates for kinases. 

25 

Summary of the Invention 

The invention relates to determination of the range of substrate 
specificities of protein kinases, to prediction of sites on sequenced proteins that 
are most likely to be phosphorylated by each kinase studied, to visual 
30 representation of those kinase specificities, to validation in vitro that peptides 
corresponding to those predicted sites are indeed phosphorylated by each kinase 
studied, and to validation of phosphorylation of those sites in vivo. The 
invention provides a simple and efficient method for determining the amino acid 
residue preferences for peptidyl sequences phosphorylated by a kinase, as well 
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as for predicting which sites will be preferentially phosphorylated by the kinase, 
and software that facilitates those methods. The invention also provides an 
informative graphical format for visually representing that information and 
software to output data in that format. Peptide sequences proven to be well 
5 phosphorylated by protein kinase C are also provided. 

In one embodiment, the invention provides a test set of peptide pools for 
identifying kinase substrate specificities. Such a test set for characterizing 
substrate specificities of kinases has at least two peptide pools. In general, 
substantially every peptide in each of the peptide pools includes one defined 

10 phosphorylatable amino acid position, one query amino acid position, at least 
one anchor amino acid position, and at least one degenerate amino acid position. 
Substantially every peptide of every peptide pool has an identical 
phosphorylatable amino acid that can be phosphorylated by a kinase at the 
phosphorylatable amino acid position. The query amino acid position is at a 

1 5 defined position relative to the phosphorylatable amino acid position within 
substantially every peptide of every peptide pool, but a query amino acid's 
identity at the query amino acid position is systematically varied from one 
peptide pool to the next peptide pool within the test set of peptide pools. Each 
anchor amino acid position is at a defined position relative to the 

20 phosphorylatable amino acid position within substantially every peptide of every 
peptide pool and each anchor amino acid position has an identical anchor amino 
acid at that anchor amino acid position within every peptide of every peptide 
pool. Each degenerate amino acid position within every peptide of every peptide 
pool is occupied by an amino acid from a defined mixture of amino acids. In 

25 some embodiments, the query amino acid position is not adjacent to an anchor 
amino acid position or the query amino acid position is not adjacent to the 
phosphorylatable amino acid position in any peptide pool of the test set. In some 
test sets of the invention, no anchor amino acid positions (or anchor amino acids) 
are present However, such test sets do have a phosphorylatable amino acid 

30 position, and at least one query amino acid position. Such "anchor-free" test sets 
will also generally have at least one degenerate amino acid position. 

In other embodiments, the invention provides a test set like those 
described above except that every peptide of every peptide pool has an identical 
query amino acid but the position of the query amino acid relative to the 
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phosphorylatable amino acid position is systematically varied from one peptide 
pool to the next peptide pool within the test set of peptide pools. One desirable 
query amino acid to use in such a test set is arginine. 

Another aspect of the invention is a test set for characterizing substrate 
5 specificities of kinases that includes at least two peptide pools, wherein 
substantially every peptide in each of the peptide pools includes one 
phosphorylatable amino acid position, one query amino acid position, and at 
least one degenerate amino acid position, and wherein: (a) each peptide of every 
peptide pool has an identical phosphorylatable amino acid that can be 
1 0 phosphorylated by a kinase at the phosphorylatable amino acid position; (b) the 
query amino acid position is at a defined position relative to the 
phosphorylatable amino acid position within every peptide of every peptide pool 
but a query amino acid's identity at the query amino acid position is 
systematically varied from one peptide pool to the next peptide pool within the 
15 test set of peptide pools; (c) each degenerate amino acid position within every . 
peptide of every peptide pool is occupied by an amino acid selected from a 
defined mixture of amino acids; and (d) the query amino acid position is not 
adjacent to the phosphorylatable amino acid position in any peptide pool of the 
test set. At least one degenerate position in each peptide pool in the test set can 
20 be occupied by a defined mixture of more than five amino acids. Such a defined 
mixture can include all natural amino acids except cysteine. Alternatively, each 
amino acid's relative abundance in the defined mixture can be approximately 
that amino acid's relative abundance in the human proteome. In some 
embodiments, the defined mixture of amino acids includes arginine. Some of 
25 the test sets of the invention have at least four peptide pools and each of the four 
peptide pools has a different query amino acid. Some of the test sets of the 
invention have a query amino acid position that is two positions N-terminal to 
the phosphorylatable amino acid position. Other test sets of the invention have a 
query amino acid position that is two positions C-terminal to the 
30 phosphorylatable amino acid position. In some embodiments, one query amino 
acid of the test set is arginine. The peptide pool of the test sets of the invention 
can be a soluble mixture of peptides. Alternatively, substantially every peptide 
in each peptide pool is attached to a solid support. In some embodiments, 
substantially every peptide in each peptide pool is linked to biotin. 
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In other embodiments, the test sets of the invention are like those 
described in the preceding paragraph but those test sets also have at least one 
anchor amino acid position, wherein: (a) each anchor amino acid position is at a 
defined position relative to the phosphorylatable amino acid position within 

5 every peptide of every peptide pool and each anchor amino acid position has an 
identical anchor amino acid at that anchor amino acid position within every 
peptide of every peptide pool; and (b) the query amino acid position is not 
adjacent to an anchor amino acid position in any peptide pool of the test set. In 
some embodiments, at least one anchor amino acid is arginine. The anchor 

10 amino acid position can be located one position C-terminal or one position N- 
terminal to the phosphorylatable amino acid position. In other embodiments, 
arginine is the anchor amino acid and the (arginine) anchor amino acid position 
is located three positions N-teiminal to the phosphorylatable amino acid 
position. In some embodiments, every peptide in each of the peptide pools has 

1 5 less than four anchor amino acids 

Another aspect of the invention is a test set for characterizing substrate 
specificities of kinases having at least two peptide pools, wherein every peptide 
in each of the peptide pools comprises one phosphorylatable amino acid 
position, one query amino acid, and at least one degenerate amino acid position, 

20 and wherein: (a) each peptide of every peptide pool has an identical 

phosphorylatable amino acid that can be phosphorylated by a kinase at the 
phosphorylatable amino acid position; (b) every peptide of every peptide pool 
has an identical query amino acid but the position of the query amino acid 
relative to the phosphorylatable amino acid position is systematically varied 

25 from one peptide pool to the next peptide pool within the test set of peptide 

pools; and (c) each degenerate amino acid position within every peptide of every 
peptide pool is occupied by an amino acid from a defined mixture of amino 
acids. The query amino acid of this test set can be arginine. In this test set, each 
peptide of every peptide pool can have at least one anchor amino acid position 

30 that is at a defined position relative to the phosphorylatable amino acid position, 
and each anchor amino acid position of peptides within a peptide pool can have 
an identical anchor amino acid at that anchor amino acid position. In some 
embodiments, the anchor amino acid of this test set is arginine and the anchor 
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amino acid position is two positions N-terminal to the phosphorylatable amino 
acid position. > 

Another aspect of the invention is a test set of peptides for characterizing 
kinase substrate specificity that includes at least 50 separate peptides, each 
5 peptide having a sequence of between 6 and 30 amino acids, wherein each 

peptide sequence is different from every other peptide sequence, and wherein at 
least 50 peptides have two or more arginines within 6 amino acid positions of a 
serine or threonine. Such a test set can have at least 96 separate peptides that 
each include two or more arginines within 6 amino acid positions of a serine or 

1 0 threonine. In another embodiment, at least half of the peptides in the test set 
have two or more arginines within 6 residues of a serine or threonine. In a 
further embodiment, at least 50 peptides have two or more arginines but two of 
these arginines are not within 2 to 3 positions N-terminal to the serine or 
threonine. In some of the test sets of the invention, at least 50 peptides have 

15 three or more arginine residues within 6 residues of a serine or threonine. One 
or more lysine residues can also be included within 6 residues of a serine or 
threonine in the peptides of the test set. Substantially every peptide in some of 
the test sets of the invention corresponds to a peptidyl sequence in a mammalian 
protein and the peptidyl sequence is within 30 amino acids of the protein's N- 

20 terminus or C-terminus 

Another aspect of the invention is a peptide set comprising two or more 
pools of peptides, wherein each pool has peptides having substantially identical 
peptide sequences and the peptide sequences in each pool are selected from the 
group consisting essentially of SEQ ID NO: 76, 81, 82, 87, 89-92, 94, 97-99, 

25 102, 104, 105, 108, 110, 112, 113, 121, 124, 127-129, 131-134, 136, 139, 143, 
144, 149, 151-154, 160, 163-171, 173-177, 179, 182-192, 196-206, 208-211, 
213-216, 474-516 or 517. 

Another aspect of the invention is an isolated peptide having any one of 
SEQIDNO:76, 81, 82, 87, 89-92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 

30 121, 124, 127-129, 131-134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 
173-177, 179, 182-192, 196-206, 208-211, 213-216, 474-516 or 517. A serine 
or threonine in the peptide can be phosphorylated. 



7 



WO 2005/028666 



PCT/US2004/029397 



Another aspect of the invention is an isolated phosphorylated peptide 
having any one of SEQ ID NO: 298, 301-324,326-347, 349-400, 402-410, 412- 
473, 571-643 or 644. 

Another aspect of the invention is an binding entity whose binding 
5 differentiates between a peptide having any one of SEQ ID NO:76, 81, 82, 87, 
89-92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 121, 124, 127-129, 131- 
134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173-177, 179, 182-192, 
196-206, 208-21 1, 213-216, 474-517 or 570, and the peptide after 
phosphorylation by protein kinase C theta; wherein the binding entity has 

10 substantially no binding to a phosphorylated peptide having SEQ ID NO: 229 
(WKN-pS-IRH). Many of the antibodies of the invention recognize 
phosphorylation sites at the N-termini and C-tennini of mammalian proteins. In 
some embodiments, the binding entity binds with greater affinity to the peptide 
after phosphorylation than before phosphorylation. In other embodiments, the 

1 5 binding entity binds with greater affinity to the peptide before phosphorylation 
than after phosphorylation. The binding entity can, for example, be an antibody, 
an antibody fragment or a mixture thereof. The peptide recognized by the 
binding entity can be part of a mammalian protein. In some embodiments, the 
peptide's sequence is within 30 amino acids of the protein's N-terminus or C- 

20 terminus of said protein. Examples of peptides recognized by the binding 
entities of the invention include peptides having any one of SEQ ID NO: 89, 
102, 110, 112, 127, 177, 182, 209, 474-488 or 489. Other examples of peptides 
recognized by the binding entities of the invention include peptides having any 
one of SEQ ID NO: 173, 185, 192, 196, 200, 490-491 or 492. 

25 The binding characteristics of the binding entity can further differentiate 

between a phosphorylated peptide having any one of SEQ ID NO: 298, 301- 
324,326-347, 349-400, 402-410, 412-473, 571-643 or 644, and a non- 
phosphorylated peptide that differs from the phosphorylated peptide by 
substitution of Ser for the pSer or substitution of a Thr for the pThr. In some 

30 embodiments, the phosphorylated peptide recognized by the binding entity can 
have any one of SEQ ID: 298, 320, 324, 350, 351, 366, 388, 394, 398, 402, 418, 
464, 571-595 or 596. In other embodiments, the phosphorylated peptide 
recognized by the binding entity can have any one of SEQ ID: 30 1 , 3 1 0, 317, 
322, 344, 352, 371, 406, 597-599 or 600. For example, the phosphorylated 
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peptide recognized by the binding entity can have SEQ ID NO:298. 
Alternatively, the phosphorylated peptide recognized by the binding entity can 
have SEQ ID NO:313 or 314. Moreover, the phosphorylated peptide recognized 
by the binding entity can have SEQ ID NO:361 or 362. 
5 The invention also provides a method for characterizing substrate 

specificities of kinases that includes: contacting each peptide pool in at least two 
test sets of peptide pools with ATP and a kinase; quantifying the amount of 
phosphorylation in each peptide pool; and comparing the amount of 
phosphorylation in each peptide pool with the amount of phosphorylation in at 
10 least one other peptide pool. Test sets like those described above can be used in 
the methods of the invention. Comparison of the amount of phosphorylation in 
different peptide pools of a test set allows calculation of the preferences of the 
kinase for each query residue, which differs between those pools. By testing 
multiple test sets (for example, by using a superset described herein), a position 
15 specific scoring matrix (PSSM) can be derived, which reflects the amino acid 
preferences of the kinase at positions around the phosphorylation position. 

The methods of the invention are flexible. For example, the same sets of 
degenerate peptides can be used to characterize many different kinases from 
every one of the millions of different biological species and an almost unlimited 
20 range of mutant kinases derived from each such kinase. Flexibility is also 

present in the type of phosphorylation sites characterized by the methods of the 
invention and in the number of query positions and residue types are explored. 
Moreover, the methods of the invention can also be modulated so that different 
residues at a single position are tested, or the same residues are tested at different 
25 positions. More than 500 peptide pools have been synthesized in more than 40 
test sets, belonging to more than 6 supersets. 

The invention further provides a computer readable medium that includes 
computer-executable instructions, wherein the computer-executable instructions 
comprise conversion of input data into quantitative values specifying a 
30 preference value for each of a plurality of amino acids at each defined position in 
a substrate peptide for a kinase, wherein: the input data comprises sequence and 
phosphorylation data for a test set of peptides comprising at least two peptide 
pools, wherein every peptide in each of the peptide pools comprises one 
phosphorylatable amino acid position, and one query amino acid position, 
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wherein: each peptide of every peptide pool has an identical phosphorylatable 
amino acid that can be phosphorylated by a kinase at the phosphorylatable amino 
acid position; the query amino acid position is at the defined position relative to 
the phosphorylatable amino acid position within every peptide of every peptide 
5 pool but a query amino acid's identity at the query amino acid position is 

systematically varied from one peptide pool to the next peptide pool within the 
test set of peptide pools; a preference value for a particular amino acid at the 
defined position is substantially determined from the amount of phosphorylation 
of the peptide pool wherein that particular amino acid is the query residue and 
10 the query position is located at the defined position. 

The invention also provides a method for visual display of amino acid or 
nucleotide sequence preferences comprising a series of stacks of single letter 
symbols for amino acids or nucleotides, wherein each stack represents a position 
in a peptide or a nucleic acid sequence; each symbol's height is proportional to 
1 5 the absolute value of a quantitative parameter that is positive for favored amino 
acids or nucleotides and negative for disfavored amino acids or nucleotides; each 
symbol's position within the stack is sorted from bottom to top in ascending 
value by the quantitative parameter. 

In another embodiment, the invention provides a computer readable 
20 medium having computer-executable instructions for performing a method of 
visually displaying amino acid or nucleotide sequence preferences, the method 
comprising: representing a position in a peptide or a nucleic acid sequence with 
a stack of single letter symbols for amino acids or nucleotides; and displaying a 
linear array of one or more stacks of letter symbols wherein each letter symbol's 
25 height is proportional to the absolute value of a quantitative parameter that is 
positive for favored amino acids or nucleotides and negative for disfavored 
amino acids or nucleotides and wherein each letter symbol's position within the 
stack is sorted from bottom to top in ascending order by the value of the 
quantitative parameter. 
30 The result of the graphic methods of the invention is a PSSM Logo, 

which is a novel graphical format for conveying the specificity information in a 
PSSM. It is particularly efficient in conveying both information on the preferred 
residues and the disfavored residues, which act in concert to determine the 
specificity of the kinase. 
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The present invention provides detailed information on the types of sites 
and amino acid sequences that are recognized and phosphorylated by a kinase, 
thereby permitting accurate prediction of which peptide sequences in the human 
proteome can be phosphorylated by a particular kinase. Hence, computer 
5 programs have been used to scan known well-defined human genes (1 5323). 
Approximately 1900 human gene products were thereby identified that had at 
least one Ser/Thr residue that predicted to be phosphorylated by protein kinase C 
(PKC) using a high stringency prediction criterion (better than 0.2 percentile). 
The validity of the PSSM derived results with supersets of peptides has been 
10 extensively validated by demonstrating an excellent correlation between peptides 
predicted to be phosphorylated in vitro by a kinase and those that are 
phosphorylated in vitro by that kinase. Moreover, the biological relevance of the 
in vitro phosphorylation is supported by comparison of sites identified with a 
literature search defining sites phosphorylated in vivo. 

15 

Brief Description of the Figures 

FIG. 1 provides examples of two test sets of peptide pools and results 
obtained with PKC-theta using the methods of the invention. 

FIG. 2 shows a superset of test sets designed for analysis of PKC 
20 specificity from P-4 to P+3. 

FIG. 3 provides counts per minute for in vitro phosphorylation by PKC- 
theta of a superset of peptide pools designed for analysis of PKC specificity from 
P-4 to P+3 for peptide pools shown in FIG 2. 

FIG. 4 provides Ratio-to-Mean values for different amino acid residues at 
25 different positions when using PKC-theta for peptide pools shown in FIG 2. 

FIG. 5 provides a position-specific scoring matrix for PKC-theta using 
the Log2 Score for peptide pools shown in FIG 2. 

FIG. 6 provides sequences of a superset of degenerate peptides designed 
to extend analysis of PKC specificity. 
30 FIG. 7 provides a position-specific scoring matrix for extended positions 

using PKC-theta for peptide pools shown in FIG 6. 

FIG. 8 illustrates the differences between the previously available 
Sequence Logo for PKC (left) and a PSSM Logo of the invention for PKC-theta 
(right). 
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FIG. 9 illustrates a validation study testing our predictions for PKC-theta 
and the previously available Scansite prediction for PKC-delta against results for 
PKC-delta. Each point on a given panel is a different peptide. The x-axis 
indicates a percentile prediction for phosphorylation of the peptide by PKC-theta 
by our PSSM using data from P-4 to P+3 (panel A); by our PSSM using data 
from P-7 to P+6; and from Scansite for PKC-delta. The y-axis indicates 
phosphorylation of the peptide by PKC-delta expressed as percentage of 
phosphorylation of the best peptide.. Dashed lines indicate a reasonable 
thresholds for positive vs negative phosphorylation (at a value of 10%), and a 
reasonable threshold for positive vs negative prediction (1 st percentile). The 
curved line is an approximation of where points would be found for an optimal 
prediction. The results indicate that the predictions made according to the 
invention are valid and are better than the previously available Scansite method. 

FIG. 10 compares the sensitivity and specificity of the present methods 
with those provided by a previously available Scansite method using PKC-della 
as the kinase. 

FIG. 1 1 illustrates validation of the PKC-theta PSSM with a second set 
of proteomic peptides that were chosen for synthesis/testing based on prior 
knowledge of PSSM percentiles. Panel A shows results for individual peptides. 
Panel B shows average results for groups of peptides grouped by PSSM 
percentile predictions. 

FIG. 12 illustrates core sequences of a superset of test sets with 1 anchor 
position, represented by the formula d??R??S????d. Because of the importance 
of 'R' at P-3 to many basophilic kinases, these test sets are particularly useful for 
25 such basophilic kinases. 

FIG. 13 illustrates PSSM Logo for results of analysis of the kinase AKT1 
with the d??R??S????d superset. 

FIG. 14 illustrates proposed abundances of residues for use in degenerate 
positions. Also illustrated are hydrophobicity scores for each residue that has 
been used in the invention to score hydrophobicity of pepudes/sequences. 

FIG. 15 shows detection of specific phosphorylation of SHP-1 by 
Western blot analysis using a pPKC antibody wherein the phosphorylation is 
augmented through stimulation by the T-cell receptor. 
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FIG. 16 provides a chart showing that scores derived from different test 
sets tested at different times are reproducible and scores extrapolated for 

untested residues can be adequately predicted. 

i 

FIG. 17 provides a graph of the data provided in FIG. 16, illustrating that 
5 scores derived from different test sets tested at different times are reproducible. 

FIG. 18 illustrates how a peptide can be scored using data derived by the 
methods of the invention. 

FIG. 19 shows the distribution of scores observed when all Ser/Thr 
containing sites in 15651 human proteins were scored with the PKC-theta PSSM 
10 and shows the cutoffs for scores corresponding to particular low percentile 
scores. 

FIG. 20 illustrates that the PKC site prediction algorithm provided by the 
invention correctly predicts previously known sites in the MARCKS protein. 

FIG. 21 shows the high similarity in specificity between novel and 
15 classical PKC isoforms, but atypical PKC differs more and great divergence seen 
with AKT1 and PKA. Values shown are the Pearson correlation coefficients 
derived from comparison of phosphorylation of panels of peptides by the kinase 
pair indicated. 

FIG. 22 illustrates the differences between PSSM Logos of different 
20 kinases analyzed with the same peptide supersets. 

FIG. 23 illustrates validation studies that demonstrate that the predictions 
made for PKC-zeta are valid and are better predictions for PKC-zeta than for 
PKC-delta. 

FIG. 24 illustrates scoring changes in peptides that are less 
25 phosphoiylated by PKC-zeta than by PKC-delta. 

FIG. 25 illustrates pbsition-specific residue preferences for PKA and 
PKG determined using the PKC superset. 

FIG. 26 illustrates the differences between PSSM Logos of different 
mutant kinases derived from PKC-theta analyzed with the same peptide 
30 supersets. A PSSM Logo for wild type kinase analyzed using low levels of ATP 
is shown in the lower right comer. 

FIG. 27 illustrates the detailed changes in amino acid preferences 
observed with PKC-theta mutant constructs and with altered kinase assay 
conditions. 
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FIG. 28 illustrates that details of residue references for PKC-theta depend 
on the choices made for anchor and phosphorylation residues in the test sets 
used. 

FIG. 29 illustrates results for ROK-alpha with test sets based on the 
5 ??R??T???? peptide set with only 4 query residues. 

FIG. 30 illustrates details of the R-Pair Anchor optimization set. 
FIG. 31 illustrates results for analysis of PKA with the R-Pair set shown 
in FIG. 30. 

FIG. 32: shows that the R-Pair set reveals positions associated with the 
1 0 strongest preference for arginine (R). 

FIG. 33 shows detection of specific phosphorylation of LIMK-2 by 
Western blot with the pPKC antibody which is augmented following stimulation 
by the T-cell receptor. 

FIG. 34 shows detection of phosphorylation of MLK3 by Western blot 
1 5 with the pPKC antibody. 

FIG. 35 is a diagram of a computerized system in conjunction with which 
embodiments of the invention may be implemented. 

FIG. 36 shows RF-pair analysis for PKC-theta where the position of the 
arginine (R) and phenylalanine (F) residues is varied in a peptide having the 
20 sequence ddddddddSFddd, where "d" is a degenerate position in which either of 
the arginine or phenylalanine residues can be placed. Each peptide consisted of 
an N-terminal linker having a biotin-dansylated lysine and a glycine (BZG) 
followed by a 13 residue insert. The phosphorylation reactions were performed 
as described herein using PKC-theta as the kinase. 
25 FIG. 37A-B shows average position-specific preferences of PKC-theta 

determined by the RF-pair (FIG. 37A) and R-pair (FIG. 37B) sets of peptides 
(see also FIGs. 30-32 and 36). 

FIG. 38A-B illustrates that there is more than one strongly preferred RF- 
pair peptide for PKC-theta. FIG. 38B provides the structures of peptides (where 
30 "d" is a degenerate position) and their corresponding ratio-to-mean values with 
log2 score. 

FIG. 39A-B provides an analysis of phosphorylation by the kinase PAK 
using an R-pair set of peptides. FIG. 39A is a chart showing how 
phosphorylation by PAK varies as the positions of the first and second arginine 
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residues are varied within the peptide set FIG. 39B provides a graph of the 
Log2 score for arginine at various positions within a peptidyl sequence. 

FIG. 40A-B provides an analysis of phosphorylation by the kinase PAK 
using an RF-pair set of peptides. FIG. 40A is a chart showing how 
5 phosphorylation by PAK varies as the positions of the arginine and 

phenylalanine residues are varied within the peptide set. FIG. 40B provides a 
graph of the Log2 score for arginine (diamond symbols) and phenylalanine 
(square symbols) at various positions within a peptidyl sequence. 

FIG. 41 A-C provides an analysis of which arginine positions are favored 
10 for phosphorylation by the kinase PAK using "diverse basic proteomic set" of 
peptides whose sequences are provided in Table 9. FIG. 41A shows the 
procedure for a chi-square analysis to determine whether arginine at position P-3 
(relative to a phosphorylation site) contributes to phosphorylation of the 16 
positively phosphorylated peptides. FIG. 41B provides the relative 
1 5 phosphorylation of 1 6 peptides from the diverse basic proteomic set of peptides 
that have arginine at P-2 relative to the phosphorylated S or T. FIG. 41 C shows 
the p-values for analysis of R at all positions between P-6 and P+3; the results 
demonstrate that R at P-2 is unique in its importance. 

FIG. 42 shows that pPKC antibody binding requires the SHP-1 residue 
20 S59 1 and that constitutively active PKC-theta (PKC-theta CA) can promote 
phosphorylation of the S591 residue. In the absence of the S591 residue (when 
using a S591A mutant), no phosphorylation by PKC-theta is detected. 

FIG. 43A-B show that SHP-1 S591 is phosphorylated in T-cells in 
response to CD3/28 or PMA. Constructs with wild type or S591 A mutant SHP- 
25 1 sequences fused to GFP sequences were transfected into JURKAT or mouse 
thymocyte cells and SHP-1 phosphorylation was detected by western blot using 
an antibody specific for the phosphorylated SHP-1 S591 site (the "anti-S591 
antibody"). As shown in FIG. 43 A, the presence of serine at position 591 in 
SHP-1 is needed for phosphorylation. When alanine is present at position 591, 
30 no phosphorylation is detected with the anti-pS591 antibody. FIG. 43B shows 
that T cell activation (using CD3/28 antibodies or PMA) in either the JURKAT 
cell line or in a mouse thymocyte preparation stimulates phosphorylation of the 
S591 residue of SHP-1. 
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FIG. 44 shows that PKC inhibitors BIM I and BIM in interfere with 
phosphorylation of SHP-1 at the S591 position. 

FIG. 45A-D show that staining by anti-pS591 antibody is specific for 
SHP-1 Ser-591. No staining is observed when the S591A mutant of SHP-1 is 
5 expressed (FIG. 45B). | , 

FIG. 46A-C shows that phosphorylation of SHP-1 S591 inhibits nuclear 
localization of SHP-1. 



Detailed Description of the Invention 

10 The invention relates to determination of the specificity of protein 

kinases, to visual representation of specificity of kinases, to prediction of sites on 
sequenced proteins that are most likely to be phosphorylated by each kinase 
studied, to validation that peptides corresponding to those predicted sites are 
indeed phosphorylated in vitro by each kinase studied, and to validation of 

1 5 phosphorylation of those sites in vivo. 

The term "kinase" (or "protein kinase") as used herein is intended to 
include all enzymes that add a phosphate group to an amino acid residue within a 
protein or peptide. Kinases that may be used in the methods of the invention 
include protein-serine/threonine specific protein kinases, protein-tyrosine 

20 specific kinases and dual-specificity kinase. Other kinases that can be used in the 
method of the invention include protein-cysteine specific kinases, protein- 
histidine specific kinases, protein-lysine specific kinases, protein-aspartic acid 
specific kinases and protein-glutamic acid specific kinases. 

A kinase used in the method of the invention can be a wild type or 

25 mutant kinase. The kinases employed can be purified native kinases, for 

example, a kinase purified from its native biological source. Kinases employed 
can be from a variety of species. Some kinases that can be employed are 
commercially available (e.g., protein kinase A from Sigma Chemical Co.). 
Alternatively, a kinase used in the method of the invention can be a kinase 

30 produced by creation of a nucleic acid construct and preparing the protein 
product expressed in vitro or in whole cells (i.e., a "recombinantly produced 
kinase"). Many kinases have been molecularly cloned and characterized and 
thus can be expressed recombinantly by standard techniques. Hence, any 
recombinantly produced kinase that retains its kinase function can be used in the 
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methods of the invention. If the recombinant kinase to be examined is a 
eukaiyotic kinase, it is generally preferable that the kinase be recombinantly 
expressed in a eukaryotic expression system to ensure proper post-translational 
modification of the protein kinase. Many eukaryotic expression systems (e.g., 
5 baculovirus and yeast expression systems) are known in the art and standard 
procedures can be used to express a kinase recombinantly. A recombinantly 
produced kinase can also be a fusion protein (i.e., composed of the kinase and a 
second protein or peptide) as long as the fusion protein retains the catalytic 
activity of the non-fused form of the kinase. Furthermore, the term "kinase" is 

10 intended to include portions of native protein kinases that retain catalytic 
activity. For example, a subunit of a multi-subunit kinase that contains the 
catalytic domain of the kinase can be used in the methods of the invention. 

One of skill in the art frequently uses a formula such as the following (I) 
to represent the amino acid positions within a peptidyl site that may be 

1 5 phosphoiylated by a kinase: 

(P-4) - (P-3) - (P-2) - (P-l) - P0 - (P+l) - (P+2) - (P+3) - (P+4) I 

where P0 is the phosphorylated position, P-l is the amino acid position 
immediately to the N-terminal side of P0, P+l is the amino acid position 

20 immediately to the C-terminal side of P0, P-2 is the amino acid position that is 
two residues from P0 on the N-terminal side of P0, etc. This terminology will be 
used herein as a general description of a kinase phosphorylation site and the 
variables P-4, P-3 etc. will be used to refer to a particular amino acid position 
within a kinase phosphorylation site. 

25 In general, key positions that determine kinase specificity are within 

about four amino acids of the phosphorylated amino acid. However, positions 
farther than four positions from the phosphorylation site can influence the 
specificity of a kinase and can be characterized by the methods of the invention. 
When one or more positions of a particular peptidyl sequence are 

30 determined, a one letter amino acid symbol may be used herein to indicate what 
amino acid is present at that determined position. The standard three-letter and 
one-letter abbreviations for amino acids provided in Table 1 are used throughout 
the application. 
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TABLE 1 



Amino acid 


3-Letter 


1-Letter 


Alanine 


Ala 


A 


Arginine 


Arg 


R 


Aspartic acid 


Asp 


D 


Aspaxagine 


Asn 


N 


Cysteine 


Cys 


C 


Glutamic acid 


Glu 


E 


Glutamine 


Gin 


Q 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


lie 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 



The P0 position is the position that can be phosphoiylated (the 
"phosphorylatable position") and is generally either a serine (S), threonine (T) 01 
5 a tyrosine 00 for human kinases. Hence 3 specific peptidyl sequences generally 
discussed herein will often have S, T or Y at the P0 position. When any of a 
defined set of amino acids is present at a given position, for example, when a 
degenerate mixture of amino acids is used during synthesis of a peptide at that 
position, a lower case "d" is used herein to represent the degeneracy of that 
1 0 position. To represent peptides in which a residue is phosphorylated, a lower 
case y is often used herein before the residue abbreviation; thus, pS or pSer 
represents a phosphorylated serine residue, pT or pThr represents a 
phosphorylated threonine, and pY or pTyr represents a phosphorylated tyrosine. 

1 5 Design of single peptide test sets: 

The invention provides for determination of the specificity of protein 
kinases by synthesis of test sets (and supersets) of peptides, subjecting the test 
sets (or supersets) to phosphorylation by a kinase of interest, and quantifying and 
analyzing the results. 
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Two simplified embodiments shown in FIG. 1 are used as examples of 
the methods provided herein. FIG. 1 A shows one test set of peptide pools (a 
"P+l" test set) and FIG. IB shows a second test set (a "P+2" test set). As used 
herein, the name of a test set generally identifies which position is being 
5 systematically varied (i.e., which position is the "query" position. Each peptide 
of the two test sets illustrated in FIG. 1 has a "core" sequence comprised of 
eleven amino acid residues. The term "core" is used to refer to amino acid 
sequences that play a key role in determining kinase specificity and is used to 
distinguish such key amino acids from N-terminal or C-terminal residues that are 

10 incorporated to provide functions unrelated to determination of specificity (such 
as for capture of the peptide onto a solid support or for quantification). 

Four different types of amino acid positions can occupy the core 
positions in each of these peptides, as well as the other peptides described herein. 
These different types of amino acid positions are described below. 

15 1) A phosphorylatable amino acid position is a position occupied by an 

amino acid to which a phosphate group can be added by a kinase. In eukaryotes 
S, T, and Y are the primary phosphorylatable residues. However, in other 
species residues such as histidine are also subject to phosphorylation. This 
residue occupies the P0 position in each peptide pool in a test set Hyphens (-) 

20 may be used herein around the amino symbol in the P0 position (e.g., -S-) to 
visually highlight this position. Note that the position of other types of amino 
acid position in the core sequence are fixed relative to this P0 phosphorylatable 
position in for all peptide pools in a given test set, and that each amino acid 
position is expressed relative to the P0 position. 

25 2) An anchor amino acid position is a position in addition to the 

phosphorylatable amino acid position having a determined amino acid that does 
NOT vary from one peptide pool to another in the test set. More than one anchor 
amino acid position can be present in a test set. The location of the anchor 
amino acid positions and identity of the anchor amino acids at each anchor 

30 position are identical for all peptides pools in the test set. For example in the 
P+l set shown in FIG. 1 A, there is one anchor amino acid: an arginine (R) at 
position P-3. In the P+2 set, there are two anchor amino acids: an arginine (R) 
at P-3, and a phenylalanine (F) at P+l . The function of the anchor amino acid 
positions is to provide sufficient favorable interaction between substrate and 
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15 



kinase to pennit measurable phosphorylation of each peptide pool. An anchor 
amino acid is represented by a single letter amino acid code for the amino acid in 
that anchor position. 

3) A query amino acid position (or a varied position) is a position that is 
5 being tested for its effect upon substrate phosphorylation. The symbol "?" is 
often used herein as a symbol for identifying the query position. Unlike anchor 
amino acid positions, there is generally only a single query amino acid position 
within all peptide pools of a test set. In general, a query amino acid is 
determined (i.e., not degenerate) for a particular peptide pool. However, the 
10 query amino acid at that query position is systematically varied from peptide 
pool to peptide pool within a test set of peptides. Hence, in contrast to the 
anchor positions, the query or varied position is occupied by different residues 
within the different peptide pools of a test set. The query or varied position is 
boxed in FIG. 1. The function of the query or varied positions is to allow 
assessment of the contribution of different amino acids to kinase specificity by 
deteimining how each of the different tested amino acids influences the amount 
of phosphorylation. 

4) A degenerate position contains an undetennined amino acid selected 
from a defined mixture of amino acids. More than one degenerate position is 
20 typically present in a test set of peptide pools. For any given peptide pool in a 
test set, all core positions that are not anchor, phosphorylatable or query 
positions are degenerate positions. Thus, the presence of one or more degenerate 
positions means that each peptide pool in a test set of peptides is actually a 
complex mixture (or "library" of distinct peptides). Although each peptide pool 
25 consists of many individual peptides, that peptide pool is often referred to herein 
as a "peptide," in keeping with common usage in the literature. Measuring 
phosphorylation of each such peptide pool assures that the assay reflects the 
average behavior of a large number of individual sequences. The symbol "d" is 
used herein as symbol of a degenerate position in the test sets of peptide pools 
30 provided herein. 

In some embodiments, the query position is not adjacent to an anchor 
position within the test sets provided herein. In other embodiments, the query 
position is not adjacent to the phosphorylatable position. 
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FIG. 1 illustrates the symbolic representation of two test sets of peptides 
designed for analysis of PKC specificity, and the corresponding peptides pools 
synthesized for those test sets. The formula ddddRdd-S-?-dd describes the P+l 
test set of peptides shown in FIG. 1, where serine is in the PO position, the query 
5 position is P+l, arginine is the anchor amino acid chosen for an anchor position 
at P-3 and the remaining amino positions are degenerate. Similarly, die formula 
ddddRdd-S-F-?-d describes the P+2 test set of peptides shown in FIG. 1, 
where: serine is in the PO position, the query position is P+2; arginine is the 
anchor amino acid chosen for an anchor position at P-3 ; phenylalanine is an 
10 anchor amino acid chosen for a second anchor position at P+l ; and the 
remaining amino acid positions are degenerate (d). 

Each test set in the embodiments shown in FIG. 1 consists of 13 peptide 
pools. The residue present at the query position in each peptide pool in a test set 
is systematically varied. However, the fixed anchor positions within all peptides 

1 5 pools of the test set provide at least a minim al level of kinase recognition and 
phosphorylation for each peptide in the test set. At the remaining core positions, 
an amino acid selected from a degenerate mixture of amino acids is used. 
Analysis of kinase specificity by phosphorylation of test sets 

Determination of kinase specificity is made by phosphorylating the test 

20 sets of peptides with a kinase of interest. Methods of the invention for 

determining the substrate specificity of a kinase generally involve contacting 
each peptide pool in at least one test set of peptide pools with a kinase and a y- 
labeled ATP, quantifying the amount of label incorporated into each peptide 
pool, and comparing the quantity of label incorporated into a peptide pool with 

25 the quantity of label incorporated into at least one other peptide pool. 

Hence, a test set of peptides is synthesized, for example, the P+l test set 
having the thirteen sequences shown in FIG. 1 panel A. The synthesized peptide 
pools in the test set are reconstituted to standardized concentrations, and 
replicate samples of the peptide pools are contacted with a kinase under assay 

30 conditions that permit phosphorylation at the P0 position. The amount of 
phosphorylation of each peptide pool can be determined, for example, by 
observing the radioactivity incorporated into the peptide pool after using y 32 P- 
ATP as a donor of the phosphate group during the phosphorylation assay. 
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FIG. 1 panel A provides results of such a phosphorylation assay for the 
P+l test set of peptides. The "raw data" are measured as counts per minute 
(cpm). As shown in FIG. 1, marked variation exists in the amount of 
phosphoiylation present in different peptide pools of a test set, reflecting 
5 important contributions of the single residue by which they differ. Furthermore, 
the SEMs (standard error of the mean of replicate values) are small, indicative of 
good assay agreement between replicates. 

In some embodiments, the determination of residue preference is made 
by comparing the cpm incorporated into each peptide, with the geometric mean 
10 cpm incorporated for all the peptides in the set That ratio is shown in FIG. 1 
within the column labeled 'Ratio-to-Mean.' The Ratio-to-Mean is also referred 
to herein as residue preference. A Ratio-to-Mean greater than 1.0 indicates that 
the selected query residue in the corresponding peptide is preferred by the kinase 
over the other types of query residues tested. For example, a Ratio-to-Mean of 
15 2.9 was observed for 'F in the P+l test set, indicating that phenylalanine at P+l 
is highly preferred by the kinase used for this assay (PKC-theta). A ratio less 
than 1 .0 indicates that the selected query residue in the corresponding peptide 
pool is disfavored compared to the other residues tested. For example, a ratio of 
0.4 was obtained for 'D' in the P+l test set, indicating that aspartic acid at P+l is 
20 disfavored by the kinase used for this assay. To visually emphasize the preferred 
residues, the log scores in FIG. 1 for favored residues with residue preferences 
greater than 1.5 are in bold and underlined. In contrast, data relating to 
disfavored residues are bold without inderlining, indicating that the residue 
preference is less than 0.67 (i.e. 1 .0 divided by 1 .5). 
25 A value called 'Log Score' (also called Log2 Score) was calculated for 

each residue by determining the log (base 2) of the Ratio-to-Mean. As a result 
of this mathematical transformation, favored residues have a positive score, and 
disfavored residues have a negative score. This score obviously differs 
depending on the position of the residue in the peptide (compare the P+l test set 
30 in FIG. 1 A with the P+2 test set in FIG. IB). Hence, each value represents a 
position-specific score for a particular amino acid residue. As indicated in FIG. 
1 panel A, arginine, lysine, phenylalanine and leucine are preferred residues at 
the P+l position for the kinase tested (PKC-theta). In contrast, aspartic acid, 

22 



WO 2005/028666 



PCT/US2004/029397 



asparagine, proline, glycine and alanine are disfavored at the P+l position for the 
kinase tested (PKC-theta). 

The invention provides computer-executable instructions for performing 
the calculations described above. One preferred embodiment uses software tools 
5 enabled by use of a spreadsheet application such as Microsoft Excel running on 
operating system such as Windows 2000 on a hardware platform such as a Dell 
Latitude using a microprocessor such as an Intel Pentium chip. For example, a 
spreadsheet is customized for a given superset of test peptides; manipulation of 
that data is provided by formulas embedded in that spreadsheets. Output of 
10 counts per minute from a TopCount NXT Microplate Scintillation and 
Luminescence Counter in a 96 well plate format were inputted into the 
spreadsheet. The results are displayed to the user in the spreadsheet, FIG. 3, 
FIG. 4. and FIG. 5 are screen captures from such a spreadsheet. In one 
embodiment additional processing of data is provided by automation of 
1 5 additional functions in the spreadsheet using the language Visual Basic for 

Applications, which is embedded in the Excel application; in other embodiments 
additional automation is provided by software objects exposed by the Excel 
interface and manipulated by software external to Excel, such as Microsoft 
Visual Basic. This embodiment uses this same computational infrastructure for 
20 performing the manipulations described in Example 3. 

Thus, the invention provides a computer readable medium having 
computer-executable instructions for determining quantitative values describing 
the preference of a kinase for a defined amino acid at a defined substrate position 
wherein the input data comprises experimental data on phosphorylation of a test 
25 set of peptides comprising at least two peptide pools, wherein every peptide in 
each of the peptide pools comprises one phosphorylatable amino acid position, 
one query amino acid position, wherein each peptide of every peptide pool has 
an identical phosphorylatable amino acid that can be phosphorylated by a kinase 
at the phosphorylatable amino acid position and the query amino acid position is 
30 at a defined position relative to the phosphorylatable amino acid position within 
every peptide of every peptide pool but a query amino acid's identity at the 
query amino acid position is systematically varied from one peptide pool to the 
next peptide pool within the test set of peptide pools 
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Supersets constructed from multiple test sets 

The test sets illustrated in FIG. 1 provide information on positions P+l 
and P+2, based on the location of the queiy position relative to the 
phosphorylatable anchor residue. In general, all positions within a test substrate 
can separately be made into query positions by constructing a test set of peptides 
for each query position. Hence, one of skill in the art can make, for example, P- 
7, P-6, P-5, P-4, P-3, P-2, P-l, P+l, P+2, P+3, P+4, P+5, P+6 and P+7 test sets- 
of peptides and systematically vary the type of amino acid at each of these query 
positions. Such a large of test sets of peptide pools with query residues at 
substantially all the different positions is referred to as a superset. In some 
embodiments, each position close to the phosphorylation site (P0) will be a 
query position and the appropriate test sets of peptides within the superset will 
be made and tested to ascertain which amino acid is preferred by the kinase at 
those query positions. FIG. 2 shows such a superset of test sets of peptides 
designed and synthesized to test the specificity of PKC and related kinases at all 
query positions from P-4 to P+3. This superset includes the two test sets shown 
in FIG. 1 together with six other test sets. 

Such supersets are phosphorylated by a kinase of interest as described for 
the test sets above. FIG. 3 shows the raw data (cpm) obtained for a 
representative experiment testing PKC-theta on the superset shown in FIG. 2. 
FIG. 4 shows the Ratio-to-Mean for that data, calculated as described above. 
FIG. 5 shows the Log (base 2) score for that data, calculated as described above. 
Taken together, the scores derived from analysis of a superset of peptides (e.g. 
FIG. 5) constitute a position-specific scoring matrix (PSSM) describing the 
residue preference of the selected kinase at different positions around the 
phosphorylation site. 

A reduced set of amino acid residues can be used in the query position of 
the test sets of peptides. Experimental data obtained for such reduced sets of 
query amino acids do not provide information for all naturally occurring 
residues. In some embodiments, data that is not obtained experimentally can be 
estimated from existing data. For example, the lower boxed region shown in 
FIG. 5 provides extrapolated data for residues that were not tested, but that have 
similar physicochemical properties to the peptides tested. Thus, in this case data 
for glutamic acid (E) was inferred from aspartic acid (D), data for isoleucine (I), 
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methionine (M) and valine (V) was inferred from leucine (L), data for tyrosine 
(Y) was inferred from phenylalanine (F). Where cysteine was excluded from the 
residues analyzed, a score for cysteine was likewise created from scores for other 
residues. Such extrapolation can be accomplished in a variety ways, for 
5 example, by assigning a score of zero, or assigning the score corresponding to 
other residues such as alanine. The accuracy of these extrapolated scores can 
then be tested as described below (Example 2). 

The method of the invention is flexible so that greater or lesser numbers 
of test sets can be included for testing as many positions as desired. For 

10 * example, FIG. 6 lists the sequences of a superset of peptide pools designed to 
extend the analysis of PKC specificity to include positions P-7 through P-5 and 
P+4 thru P+6. FIG. 7 shows an extended position-specific scoring matrix for 
positions P-7 through P-5 and P+4 through P+6 derived from testing PKC-theta 
with the test sets shown in FIG. 6. Taken together, the scores from FIG. 5 and 

1 5 FIG. 7 provide a position-specific scoring matrix for PKC-theta for positions P-7 
to P+6. The ability to combine results from different sets and different 
experiments is a convenient aspect of the invention. 
Visual representation of kinase specificity 

An efficient strategy for visual representation of specificity information 

20 is important for conceptualizing and communicating findings on kinase 

specificity. A previously described method for visualizing peptide specificity 
data is via the Sequence Logo developed by Thomas Schneider (Schneider TD et 
al. 1990..Nucleic Acids Res. 18:6097-6100). In that article, the method is 
described as follows 'The height of each letter is made proportional to its 

25 frequency, and the letters are sorted so the most common one is on top. The 
height of the entire stack is then adjusted to signify the information content of 
the sequences at that position." This visualization method is illustrated on the 
left side of FIG. 8 for a published Sequence Logo generated by the Schneider 
method for protein Kinase C (PKC) (Kreegipuu A et al. 1998. FEBS Lett 

30 430:45-50). 

The invention provides a new method for visualizing which amino acids 
are preferred in the substrate of a kinase. This method involves use of a position 
specific residue scoring matrix (PSSM) to generate a PSSM Logo. Each 
position in a PSSM is represented in a PSSM Logo by a vertical stack of amino 
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acid residue single letter codes. The height of each code is made proportional to 
the absolute value of a Log Score, and the positions of the codes in the stack are 
sorted from bottom to top in ascending value by the quantitative parameter. An 
example of a PSSM Logo of the invention is provided on the right side of FIG. 
5 8, which illustrates the results for analysis of PKC-theta with peptide pools 
shown in FIG. 2 and FIG. 6. In the preferred embodiment, each single letter 
code is colored to indicate the physico-chemical properties of the corresponding 
residue; for example R, K, H could be blue to indicate basic, D, E red to indicate 
acidic, I, L, M, V, F, Y could be grey to indicate hydrophobic. 
10 Two major differences exist between the previously available Sequence 

Logo and a PSSM Logo of the invention. The most fundamental difference 
between a Sequence Logo and a PSSM Logo is that the PSSM Logo visually 
emphasizes the residues that are disfavored by the kinase as well as the ones that 
are favored by the kinase. In contrast, the Sequence Logo only emphasizes the 
1 5 residues that are favored. Such distinction is not a trivial distinction, but rather 
represents a fundamental difference in emphasis between the method of the 
invention and those of prior workers. ( In particular, the present methods 
accurately determine which amino acid residues are disfavored, which has not 
previously been emphasized and which can be a controlling factor in 
20 determining kinase specificity (see below). 

A secondary difference between the previously available Sequence Logo 
and a PSSM Logo of the invention is in the parameters represented by the PSSM 
Logo versus those represented by the Sequence Logo. The Sequence Logo, as 
described by Schneider, is determined by a combination of the parameters 
25 referred to as 'information content' of that position, and of the residue 

frequency. In contrast, in a preferred embodiment, the PSSM Logo reflects the 
log scores obtained by the methods of the invention, which are not 
interchangeable with residue frequency. In other embodiments, the parameter 
represented in the PSSM Logo is the log of the ratio of [residue 
30 frequency]/[control residue frequency]. Hence, the PSSM Logo is distinct from 
the Sequence Logo. 

Note that use of a PSSM Logo is not restricted to findings of kinase 
specificity, but rather is generally useful for expressing results pertaining to 
amino acid residue preference. Thus, for example, results of other experimental 
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methods for determination of residue preference for peptide binding (rather than 
phosphorylation) can equally well be represented with a PSSM Logo. Moreover, 
nucleotide sequence preferences can also be represented using a PSSM Logo. 
One embodiment uses software tools enabled by use of a spreadsheet 
5 application such as Microsoft Excel running on operating system such as 
Windows 2000 on a hardware platform such as a Dell Latitude using a 
microprocessor such as an Intel Pentium chip. Software objects exposed by the 
Excel interface are manipulated by software external to Excel, such as Microsoft 
Visual Basic. Information in the spreadsheet for each substrate position consists 

10 of paired columns, one comprising the residue code and one comprising the log2 
scores. Rows in that pair of columns are sorted in descending order by log2 
scores. That sorted information is converted into a file of commands using 
postscript programming language which instruct a postscript printer (such as 
Xerox Phaser 6200 printer) to create symbols of the appropriate size and 

1 5 position in a column. Successive columns in the PSSM are processed similarly 
and the postscript code instructs the printer to move horizontally to position 
information on each successive substrate position into adjacent columns. 

Thus, the invention provides a computer readable medium having 
computer-executable instructions for performing a method of visually displaying 

20 amino acid or nucleotide sequence preferences, the method comprising: 

representing a position in a peptide or a nucleic acid sequence with a stack of 
single letter symbols for amino acids or nucleotides; and displaying one or more 
stacks of letters wherein each symbol's height is proportional to the absolute 
value of a quantitative parameter that is positive for favored amino acids or 

25 nucleotides and negative for disfavored amino acids or nucleotides and wherein 
each symbol's position within the stack is sorted from bottom to top in 
ascending value by the quantitative parameter. 

The invention also provides an overview of the hardware and the 
operating environment in conjunction with which embodiments of the invention 

30 can be practiced. Figure 35 is a diagram of a computerized system in 

conjunction with which embodiments of the invention may be implemented. 
Thus, in one embodiment, computer 110 is operatively coupled to a monitor 1 12, 
a pointing device 114 and a keyboard 116. Computer 110 includes a central 
processing unit 118, random-access memory (RAM) 120, read-only memory 
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(ROM) 122, and one or more storage devices 124, such as a hard disk drive, a 
floppy disk drive, a compact disk read-only memory (CD-ROM), an optical disk 
drive, a tape cartridge drive or the like. RAM 120 and ROM 122 are collectively 
referred to as the memory of computer 110. The memory, hard drives, floppy 
5 disks, etc., are types of computer-readable media. The computer-readable media 
provide nonvolatile storage of computer-readable instructions, data structures, 
program modules and other data for computer 110. The invention is not 
particularly limited to any type of computer 110. 

Monitor 112 permits the display of information for viewing by a user of 
10 the computer. Pointing device 114 permits the control of the screen pointer 
provided by the graphical user interface of window-oriented operating systems 
such as the Microsoft Windows family of operating systems. Finally, keyboard 
116 permits entry of textual information, including commands and data, into 
computer 110. 

15 The computer 110 operates as a stand-alone computer system or operates 

in a networked environment using logical connections to one or more remote 
computers, such as remote computer 126 connected to computer 110 through 
network 128. The network 128 depicted in Figure 34 comprises, for example, a 
local-area network (LAN) or a wide-area network (WAN). Such networking 

20 environments are common in offices, enterprise-wide computer networks, 
intranets, and the Internet. 

An example hardware and operating environment in conjunction with 
which embodiments of the invention can be practiced has been described. 
Validation of the results obtained using the methods described 

25 One of the principle uses for the methods of the invention is to predict 

sites of phosphorylation in proteins whose sequences are known but whose 
phosphorylation sites are unknown. The ability to correctly predict 
phosphorylation sites will depend on the correctness of the methods employed. 
If the values for residue preference in for a kinase are incorrect, then the 

30 predictions are unlikely to be correct As described herein a PSSM generated by 
the methods of the invention will generally provide better and more complete 
substrate specificity information than previously employed methods and 
predictions employed. 
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Rather surprisingly, systematic validation has not been reported for 
previously reported predictive algorithms, such as those proposed by U.S. Patent 
6,004,757 to Cantley et al. For example, Nishikawa K et al. 1997. J Biol Chem 
272:952-960 describes an approach for determining peptide specificity for PKC, 
5 but the validation provided was limited to a showing that the optimal peptides 
predicted for two different kinases are preferentially phosphorylated by their 
respective kinases. No validation was provided that the sequence identified was 
the best sequence, or that good in vitro substrates can be identified by using the 
remainder of the information derived from the technique. While, Cantley and 

1 0 co-workers also propose that the results of such predictions correlate with 

physiologically relevant sites, such assertions are based on a modest correlation 
with anecdotal results from the literature. 

One approach to validating a substrate identification method can involve, 
for example, comparison of substrate sites predicted by the method with in vitro 

15 phosphorylation results obtained using the selected kinase and peptides of 
known sequences. Such a systematic validation has been performed for the 
methods described herein. For example, a panel of seventy five peptides was 
synthesized, the phosphorylation observed for each peptide was experimentally 
measured, the amount of phosphorylation was quantified, the phosphorylation 

20 results for each peptide were normalized to the phosphorylation observed with 
the best substrate tested and these amounts were compared with predictions 
made according to the invention and according to the procedures provided by 
others. These peptides are referred to herein as proteomic peptides because their 
sequences are chosen from proteins in the human proteome; unlike the test sets 

25 employed herein, these peptides include no degenerate positions 

Fairness of a validation strategy requires that the choice of test peptides 
not be unfairly biased by findings from the PSSM being validated. The choice 
of the peptides in Table 2 was not biased by information from the PSSM-based 
scoring illustrated herein because the peptides were chosen and synthesized 

30 more than five months before the method was established. The dominant criteria 
for selection of the peptides was computerized scanning of human protein 
sequences amongst NCBI reference sequences (see website at ncbi.nlm.nih.gov/) 
to identify sites with an abundance of positively charged residues in positions P- 



29 



WO 2005/028666 



PCT/US2004/029397 



3 to P+3 relative to a potential P0 phosphorylation position (S or T), and with 
good diversity in the P-l and P+l positions. 

The results of this analysis for phosphorylation are provided in Table 2. 
While the results provided in Table 2 show measured phosphorylation by PKC- 
5 delta, the PKC-delta predictions made by the methods of the invention (shown in 
Table 2) were actually based upon data obtained by PKC-theta. In contrast, data 
generated by the methods of Cantley and co-workers was available for PKC- 
delta (Nishikawa K et al. 1997. J Biol Chem 272:952-960; and Scansite at 
scansite.mitedu). Because the predictions from the present methods are based 
10 on PKC-theta, which is distinct from PKC-delta but is the PKC isofoim closest 
to PKC-delta, the comparison provided in Table 2 is biased in favor of the 
method provided by Cantley and co-workers. Despite this bias, the results 
demonstrate that predictions made by the methods of the invention are better 
than predictions made by the methods of Cantley and co-workers (Scansite). 



Table 2: Validation of the Present Methods 
Comparison of Present Method vs. Scansite Predictions 



SEQ 
ID 
NO: 


Sequence 


Prediction (percentile) 


Measured in 
vitro 
phosphorylation 
by PKC-delta 


Invention 
for PKC- 
theta 


Scansite 
for 
PKC-delta 


1 


HVRRRRGTFKRSKLRARD 


0 


0.26 


100 


2 


KKKKRASFKRKSSKKG 


0 


0.01 


76 


3 


NRKKKRTSFKRKA 


0.1 


0.05 


66 


4 


KFARKSTRRSIRLPE 


0.9 


4.29 


52 


5 


RQRKRKLSFRRRTDKD 


0 


0.35 


42 


6 


PRLIRRGSKKRPAR 


0 


>5 


40 


7 


RKIPKRPGSVHRTPSRQ 


0.2 


4.23 


38 


8 


AARKKRISVKKKQEQ 


0.2 


0.04 


35 


9 


QKKSRLRRRASQLKI 


0.1 


3.83 


34 


10 


AQIVKRASLKRGKQ 


0.5 


0.03 


32 


11 


KKKFRTP S FLKKS KK 


0.4 


1.52 


25 


12 


KKKICKRFSFKKSFKL 


0.2 


0 


24 


13 


WKGKRRS KARKKRK 


2.5 


>5 


22 


14 


EYLERRASRRRAV 


0.1 


>5 


20 


15 


RGFLRSASLGRRASFHLE 


0 


0.41 


18 


16 


DGQKRKKSLRKKLD 


0 


>5 


17 


17 


AGWRKKTSFRKPKED 


0.2 


0.75 


17 


18 


KKRFSFKKSFKLSGFSFKKN 


0.2 


0.01 ; 


16 


19 


AGSFKRNSIKKIV 


0.3 


1.69 


14 


20 


GAPPRRSSERNAH 


0.4 


>5 


13 


21 


KLAVGRHSFSRRSGV 


0.5 


>5 


12 
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SEQ 
ID 




Prediction (percentile) 




Sequence 






Measured in 






vitro 


NO: 


Invention 


Scansite 




for PKC- 


for 


phosphorylation 






tbeta 


PKC-delta 


by PKC-delta 


22 


LLKKRDSFRTPRDSKLE 


2.5 


2.51 . 


12 


23 


QKRHARVTVKYDRRE 


1.5 


4.49 


10 


'24 


EKIKRSSLKKVDSLKK 


1.5 


0.02 


10 


25 


EILSRRPSYRKILND 


0.1 


>5 


9 


26 


ALRRPSLRREADD 


0.2 


>5 


9 


27 


KKRKKKSSKSLAHA 


2.7 


0.02 


8 


28 


KRPGKKGSNKRPGKR 


4 


0.48 


8 


29 


RKNDRKKRYTVVGNP 


>5 


>5 


8 


30 


KEWRTDSLKGRRGR 


1.5 


>5 


7 


31 


RKKRKKKSSKSLAHAGVALA 


2.7 


0.02 


>5 


! 32 


KATTKKRTLRKNDRK 


L7 


0.48 


>5 


33 


QQKIRKYTMRRLLQE 


0.5 


>5 


>5 


34 


EGGDRRASGRRK 


2.1 


>5 


5 


35 


GLLDRKGSWKKLDDM 


2.1 


3.26 


4 


36 


GENVLKKSMKSRVKG 


5.2 


>5 


4 , 


37 


AYIERMNSIHRDLRA 


3.1 


>5 


3 


38 


NYLRRRLSDSNFMAN 


0.9 


>5 


3 


39 


LLGSGKVTDRKAL 


>5 


>5 


3 


40 


N ME AKKLS KDRMKJC Y 


>5 


>5 


3 


41 


FVHQASFKFGQGD 


1.5 


0.04 


3 


42 


QPEGLRSLKKPDRKKR 


>5 


>5 


3 


43 


AWVTVHEKJCSSRKSEYL 


4.2 


2.95 


3 


44 


VLAKKGTSKTPVPE 


>5 


2.43 


2 


45 


VFREHQRSGSYHVRE 


0.1 


>5 


2 


46 


GQAWGRQSPRRLED 


>5 


>5 


2 


47 


ARI1GEKSFRRSWG 


2.7 


0.69 


2 


48 


AVNSRRRAGQKKK 


5 


>5 


2 


49 


VQQLLRSSNRRLEQL 


>5 


>5 


2 


50 


ENLRRVATDRRHLGH 


0.8 


: >5 


2 


51 


DLLGKKVSTKTLSEDD 


>5 


4.05 


2 


52 


HKHSPEKRGSERKEG 


>5 


>5 


2 


53 


AKNLKTLQKRDSFIG 


>5 


0.41 


2 


54 


ENLRKVTTDKKS LAY 


>5 


0.01 


2 


55 


DDMEHKTLKJTDFG 


1.5 


>5 


2 


56 


EARLGAASLKFGARD 


>5 


o.oi ! 


2 


57 


KNWKLLSSRRTQDR 


>5 


4.49 


2 


58 


RVKLGTLRRPEGP 


>5 


4.05 


1 




JT V nlVTvOiV I I lVliViiV 


4.1 


0.18 




60 


LRRKHLGTLNFGGIR 


>5 


0 




61 


VDNILKKSNKKLEEL 


5.3 


>5 




62 


AVRDMRQTVAVGVIK 


>5 


0.84 




63 


QRQERIFSKRRGQDF 


3.4 


>5 




64 


ALRAPKPTLRYFTTERF 


>5 


0 




65 


IKVTHKATGKVMVMK j 


>5 


>5 




66 


GFAKKIGS GQKT WTF 


>5 


0.15 




67 


AINSRETMFHKERFK 


>5 


>5 | 




68 


RGEGHKPSIAHRDFK 


>5 


>5 




69 


LALTARESSVRSGGAG 


>5 


0 




70 


HERKGSDKRGDNQ 


4.1 


>5 
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SEQ 




Prediction (percentile) 




ID 


Sequence 






Measured in 


NO: 


Invention 


Scansite 


vitro 






forPKC- 


for 


phosphorylation 






theta 


PKC-delta 


by PKC-delta 


71 


RRRQKJRRTGALVLSRGGKR 


>5 


>5 


1 


72 


LTDPKEDPIYDEPEGLAPVPG 


>5 


>5 


0 


73 


IDYYKKTTNGRLPVK 


>5 


>5 


0 


74 


IDYYKKTSNGRLPVK 


>5 


>5 


0 


75 


EEAEHKATKARLADK 


>5 


>5 


0 



Two steps are involved in the validation process: making the predictions, 
then assessing the predictions by comparison with measured values. When a 
PSSM is obtained by the methods of the invention, the calculation of a 
5 prediction is straightforward, using the algorithms described herein (see, e.g., 
example 3). 

Table 2 compares the present predictions with actual measurements of 
phosphorylation on validating peptides. The method of synthesis of the 
validating peptides was as described elsewhere in the application, and each 
10 included an N-terminal linker sequence of biotinylated-Lys-dansylated-Lys-Pro- 
Pro-Gly (SEQ ID NO:23 1). The length of the remaining "core" of the validating 
peptides ranged from 12-21 residues with one to five S/T residues. In vitro 
phosphorylation of these validating peptides was measured in the manner 
described herein. Measurements were obtained by phosphorylation of the 
1 5 validating peptides with PKC-delta at a peptide concentration of lOnM. In vitro 
phosphorylation results for the validating peptides were expressed as normalized 
values, namely as a percentage of phosphorylation of the best validating peptide 
substrate in me group. Hence, a higher value for the. measured in vitro 
phosphorylation of a validating peptide indicated that the validating peptide was 
20 phosphorylated to a greater extent than a validating peptide with a lower 
phosphorylation value. 

Many of the peptides employed (Table 2) have multiple serine/threonine 
residues; the score for a peptide is determined by scoring each Ser/Thr in the 
peptide and the lowest (i.e. best) percentile for all residues that could be 
25 phosphorylated was taken as the percentile for the peptide. 

In addition to the measured value, Table 2 tabulates percentile prediction 
scores for the validating peptides where the prediction scores were obtained 
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either by the methods of the invention or by the methods of Cantley and co- 
workers. To obtain predictions made as described by Cantley et al, the sequence 
of the peptide was analyzed using Scansite (see website at scansite.mit.edu/). 
Scansite is a website made publicly available by L. Cantley and M. Yaffe to 
5 predict best substrates based on data derived by the Cantley degenerate peptide 
strategy. By both the present methods and by the methods of Cantley, a lower 
positive prediction value indicated a stronger prediction that the peptide will be 
phosphorylated. Using the conventions of Scansite, predictive percentile scores 
greater than 5 were shown as >5. 

10 As shown in Table 2, FIG. 9, and FIG. 10 , the methods of the invention 

are better predictors of which peptide sequence will be phosphorylated than are 
the methods provided by the prior art. For example, peptide SEQ ID NOs: 4, 7, 
9 and 1 1 were highly phosphorylated by the in vitro validating assay but the 
Scansite methods predicted significantly poorer levels of phosphorylation than 

15 did the methods of the invention. Similarly, peptide SEQ ID NOs:60, 64, 66 and 
69 were poorly phosphorylated by the in vitro validating assay but the Scansite 
methods predicted significantly-higher levels of phosphorylation than did the 
methods of the invention. 

The predictive accuracies of the methods of the invention and those of 

20 Cantley and co-workers (Scansite) are summarized in FIG. 9. FIG. 9 provides a 
correlation between the predicted percentile and the measured phosphorylation 
for each peptide. Results are shown for three different predictions: predictions 
of the invention based only on positions -4 to +3 for PKC-theta; predictions of 
the invention based on positions -7 to +6 of PKC-theta and the Scansite 

25 prediction for PKC-delta. A curve has been overlaid on each of the three plots to 
indicate what the correlation might be expected to look like. Note that accurate 
predictions will have few peptides in the upper right (false negatives) or the 
extreme lower left (false positives). Inspection of FIG. 9 reveals that predictions 
made by using the methods of the present invention are both good, and that the 

30 expansion from P-4/P+3 to P-7/P+6 gives modestly improved predictions. In 
contrast, the pattern observed with the Scansite prediction includes many more 
peptides that are located at positions far from the optimal correlation. 

FIG. 10 tabulates the results obtained. As shown in FIG. 10, the methods 
of the invention have approximately 90% specificity and sensitivity while the 
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methods provided by Scansite have only 70% specificity and 45% sensitivity. 
Thus, the methods provided by the invention for predicting kinase specificity are 
better than this prior art approach for predicting PKC-delta specificity, even 
though the analysis was weighted in favor of the Cantley approach by using 
5 PKC-delta, which was exactly the kinase that Cantley used, and only a close 
relative of the kinase used in the methods of the invention (PKC-theta). 
Identification of peptides efficiently phosphorylated by PKC 

A second strategy for validation of the PSSM derived from the methods 
described herein is to identify sequences represented in the human proteome that 
10 have low percentiles derived from the PSSM, to synthesize peptides that have 
those sequences, and test the efficiency of phosphorylation of those peptides by 
the kinase of interest FIG. 1 1 shows the results for such an analysis for 96 
individual peptides. The results are shown for individual peptides (FIG. 1 1, 
panel A) or for groups of peptides aggregated by percentile prediction (FIG. 1 1, 
1 5 panel B). As with the testing described above with prospectively chosen 

peptides, the percentile scores are highly predictive of phosphorylation by the 
relevant kinase. 

The process of prediction and testing resulted in identification of many 
peptides predicted to be substrates for PKC-theta and demonstrated to be 
20 substrates for PKC-theta (Table 3). A number of the sequences surrounding the 
most likely phosphorylation site have quite incomplete matches to the prototypic 
PKC substrate pattern [RK][RK]x[ST][hydrophobic][RK][RK]. Most of these 
peptides/sites have not previously been reported to be substrates for PKC in vivo 
or in vitro. 

■ 25 

TABLE 3. 

Identification of in vitro substrates of PKC-theta with further method 



validation 



Sequence 


SEQ 
ID 
NO 


Locus- 
LinkTT) 


Name 


Measured in 

vitro 
phosphoryla- 
tion by PKC- 

theta 


Prediction 
from PKC- 
theta 


-AMSRSA-S- 












KRRSR- 


168 


7074 


TIAM1 


100 


0.5 








S1P1 




RTRSRRL-T-FRK — 


169 


1901 


receptor 


100 


0.0 
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Sequence 


SEQ 
ID 
NO 


Locus- 
ts in kID 


Name 


Measured in 

vitro 
phosphoryla- 
tion by PKC- 

theta 


Prediction 
from PKC- 
. theta 


-VKLRR-S- 












KKRTKR 


170 , 


1794 


DOCK2 


98 


0.1 


-RRGRRSTKKRRR 


171 


55672 


FLJ20719 


92 


0.0 


--VRRRRSQRISQR 


172 


25836 


IDN3 


86 


0.0 








absent in 






RSGRRRGSQKS— 


173 


202 


melanoma 1 


85 


0.0 


KKERRRNSINRN-- 


174 


4542 


myosin IF 


83 


0.0 








DAP-kinase 






-KKRRTKSSRRGV- 


175 


1612 


1 


80 


0.1 








forkhead 












(Drosophila)- 






— RRERSRSRRKQ 


176 


2305 


like 16 


66 


0.1 


-RRRRRRSRTFSR- 


177 


1196 


CLK2 


66 


0.0 


— RRRRSRTFSRS 


178 


1196 


CLK2 


65 


0.0 


-KRHYRKSVRSRS- 


179 


65125 


WNK1 


65 


0.1 


-FLRRSSSRRNRS- 


180 


9595 


PSCDBP 


65 


0.1 








ribosomal 






TGERKRKSVRG— 


181 


6194 


protein S6 


62 


0.3 








nucleolar 












r> \\ n^nli rmi*rit<* 






-TKKKRGSYRGGS- 


182 


9221 


inpl30 


61 


0.6 


-ARRSKRSRRRET- 


183 


23031 


MAST3 


55 


0.1 


— FRASSRSTTK 


184 


4863 


NPAT 


S4 


1 0 


KKFKRRLSLTLR— 


185 


5128 


PCTK2 


51 


0.1 








prostaglandin 






-DFRRRRSFRRIA- 


186 


5734 


E receptor 4 


50 


0.0 


--LRRKSSTRHIHA 


187 


672 


BRCA1 


48 


0.2 


-ERGRRGSKKGSI- 


188 


695 


BTK 


44 


0.1 








serine/threoni 












ne-protein 






GRRRRSRSKVK — 


189 


8899 


kinase PRP4 


43 


0.0 


RRRRHTMDKDSR 


190 


65125 


WNK1 


40 


0.1 


— HKRNSVRLVIR 


191 


409 


beta-arrestin2 


38 


0.5 


GNRKGKSKKWRQ- 


192 


2870 


GRK6 


35 


0.5 


— PLRKSSLKKGGR 


193 


393 


ARHGAP4 


35 


0.3 








casein kinase 






-KRRKRKSLORHK- 


194 


1455 


T aamma 0 




n i 


PGSSHRKTKK — 


195 


695 


BTK 


33 


0.8 


-RWKRRRSYSREH- 


196 


1198 


CLK3 


32 


0.1 


-ILRPSKSVKLRS- 


197 


26191 


Lyp 


32 


0.6 


-RRRRPTKSKGSK 


198 


65125 


WNK1 


28 


0.0 








serine/threoni 












ne-protein 






-RGRRSRSRLRRR- 


199 


8899 


kinase PRP4 


27 


0.0 


EQQRRALSFRQ— 


200 


5778 


HePTP 


26 


1.0 


-TQDRRKSLFKKI- 


201 


23031 


MAST3 


25 


0.2 


-VMKRKFSLRAAE- 


202 


6840 


supervillin 


24 


0.6 


-VRRSKKSKKKES- 


203 


23227 


MAST4 


24 


0.3 
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Measured in 
vitro 



Sequence 


SEQ 
ID 
NO 


Locus- 
LinkCD 


Name 


phosphoryla- 
tion hy PKC- 
theta 


Prediction 
fromPKC- 
theta 


RPSRRSSSWRIL- 


204 


4033 


LRMP 


22 


0.6 


-EGRRSRSRRYSG- 


205 


1105 


CHD1 


22 


0.1 


KSSRNSTSVKKK- 


206 


9934 


GPR105 


19 


0.3 


-SFRGHITRKKLK- 


207 


2596 


gap-43 


18 


0.2 


-VSRPRKSRKRVD- 


208 


25836 


IDN3 


17 


0.2 


DKEKSKGSLKRK— 


209 


5777 


SHP-l 


17 


2.0 


-PLRRRESMHVEQ- 


210 


6650 


SOLH 


16 


0.1 


RSRSYSRSRSR — 


211 


4820 


NKTR 


16 


1.0 


— VSRGSSLKILSK 


212 


7852 


CXCR4 


13 


2.0 


-RHSRSRSRHRLS- 


213 


8621 


CDC2L5 


13 


0.8 


-SRRRSPSYSRHS- 


214 


8621 


CDC2L5 


13 


0.3 








serine/threoni 












ne-protein 






-TKKRSKSRSKER- 


215 


8899 


kinase PRP4 


12 


0.5 


- SCRTS SRKRAGK 


216 


8915 


BCL10 


11 


1.0 



Considerations in design of test sets of peptides 

Design of each test set of peptides involves important decisions 
regarding: the choice of phosphorylatable residue, the choice of anchor 
5 positions, the identity of residues at the anchor positions, the choice of the query 
positions, the identity of residues for the query positions and choice of positions 
and residue types for the degenerate positions. These considerations are 
discussed in more detail below. 

In most embodiments, one position is a residue that can be 

10 phosphorylated (a phosphorylatable amino acid position), such as serine (S), 
threonine (T) or tyrosine (Y). As described above such a phosphorylatable 
position is referred to as "P0." The choice between S, T and Y is based on the 
known or inferred phosphorylation preference of the kinase(s) whose specificity 
is to be assessed. For example, protein kinase C (PKC) phosphorylates a serine 

1 5 (S) more often than threonine (T). However, data obtained by the inventors 
indicates that Rho-kinase generally phosphorylates a threonine (T) and it has 
been previously determined that Lck generally phosphorylates a tyrosine (Y). 
Hence, one of skill in the art can use available information to assign the identity 
of the phosphorylatable amino acid. Alternatively, procedures like those 

20 provided herein or other available procedures can be used to determine which 
residues are preferentially phosphorylated by a kinase of unknown specificity. 
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Selecting the number and identity of Anchor Positions. 

Anchor positions in the peptides used in the present methods can be at 
any position within the sequence of a test peptide pool. In particular, anchor 
positions do not need to be contiguous (i.e. next) to each other in the present. 
5 methods. Anchor positions need not be adjacent to the query amino acid 
position. Anchor positions also do not need to be adjacent to the 
phosphoiylatable residue. For example, many of the test sets in the superset of 
peptides used for PKC analysis had anchor residues in the pattern Rxx-S-F (see 
FIG. 2) where the anchor residue arginine (R) was adjacent neither to the 
10 phosphorylatable residue serine (S) nor to the other anchor residue phenylalanine 
(F). 

The number of anchor positions selected for a set of peptides can 
influence the amount of information obtained about the substrate. In general, if 
too many residues are anchored then the test set will be relatively insensitive to 

15 changes in the query residues. However, if too few residues are anchored then 
the average amount of phosphorylation in the set will be too low. Low levels of 
phosphorylation can lead to error-prone readings. For example, when there is a 
low level of phosphorylation, decreases in phosphorylation caused by disfavored 
query residues will generally be small and unreliable. 

20 In most embodiments, one or two positions are assigned to be anchor 

positions. However, a larger number of anchor residues can be useful in some 
embodiments, particularly those designed for particular conditions. As 
illustrated herein some embodiments have two anchor positions. For example, 
two anchor residues were used for six of the eight test sets in a superset design 

25 for PKC analysis, i.e. R??-S-F?? (FIG. 2). As show herein, use of this superset 
provides a good characterization of the specificity of PKCs. 

Supersets with one anchor position are also very useful. The utility of 
such a superset with one anchor position is illustrated by a superset consisting of 
8 test sets with the symbolic representation d??R??S????d (FIG. 12). This 

30 d??R??S????d superset is an especially useful superset for initial characterization 
of kinases that may be basophilic, because many basophilic kinases have a 
strong preference for 'R' at the P-3 position. 

FIG. 13 shows a PSSM Logo for analysis of the kinase AKT1 with this 
superset, which provides a good overview of the preferences of AKT1 at most 
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positions between P-5 and P+4. Because there is only one anchor residue, the 
counts per minute for this superset after phosphorylation are typically lower than 
with two suitable anchor positions. However, this superset can still provide an 
adequate "dynamic range" showing favored and disfavored residues (FIG. 13). 
5 Data from this analysis provides an approximation of the specificity of AKT1 . If 
more precise understanding is required, then a suitable second anchor position 
can be chosen from the results of this d??R??S????d set, and an additional 
superset(s) of test peptides can be synthesized with two anchor positions. One of 
skill in the art can envision other one-anchor sets that would be especially useful 
10 such as d?????SP???d for proline-directed kinases, d?????SQ????d for 'SQ* 
directed kinases, and d?????SR???d for S SR 5 directed kinases. 

According to the invention, several principles for choosing a second 
anchor position from the results of a one anchor set such as d??R??S????<L In 
general, the second anchor is an amino acid that is strongly preferred by the 
15 kinase of interest. In the case of AKT1, illustrated by FIG. 13, there are multiple 
such residues, for example, R at P-5, R at P-2, and F at P+l . In choosing 
between those, a secondary consideration is minimizing the number of other 
preferred residues at that position. Hence, a second anchor amino acid is 
selected as the most preferred of only a few preferred residues at that position. 
20 Based on that criterion, a particularly good choice would be R at P-5. If one of 
skill in the art wishes to obtain more detailed information on which anchor 
residues to select, multiple second anchors can be chosen and supersets 
synthesized to test each anchor position. ' 

It is also important to note that a superset based on no anchors, such as 
25 d????S????d or d????Y????d can also be useful. Information derived by 
analysis with such a set could be particularly useful for choice of a second 
anchor (distinct from R at P-3) on which to build a superset conceptually similar 
to the d??R??S????d superset. 

If sufficient prior knowledge is available, the anchor residues for test sets 
30 can be chosen based on that prior knowledge. The choice of anchor positions 
and anchor residue identities for the RxxSF PKC-theta supersets (FIG. 2 and 
FIG. 6) were based on prior knowledge of the inventor on PKC specificity in 
which the dominant residues that determine PKC specificity were believed to 
include arginine at P-3, arginine at P-2, phenylalanine at P+l, arginine at P+2 
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and arginine at P+3. Therefore, some or all of such previously identified 
residues and/or positions can be chosen for the anchor positions of a particular 
test set or superset of peptides. 

The method of the invention also provides an approach referred to as 
5 "Optimal Residue Position Scanning" (ORPS) to experimentally determine good 
anchor residues when prior knowledge is insufficient. Details of ORPS are 
described in Example 9 and Example 12, and their use further illustrated in 
Example 14. 

Choice of the query positions and amino acids at the query position. 

10 In most embodiments each test set has only one query position. This 

assures that the difference between peptides in the test set can be clearly 
attributed to change in a single amino acid at a standardized position. 

Of importance in the current method is the fact that the query position 
does NOT need to be adjacent to either an anchor position or to a 

1 5 phosphorylatable position. This contrasts with pervasive use by previous worker 
of query-like positions adjacent to anchor-like positions (and phosphorylatable- 
like positions) in methods using "systematic amino acid variation on template 
substrate" (SAaVoTS). Particularly notable is that the extensive work of Tegge 
and colleagues on finding optimal peptides/inhibitors was based on query 

20 residues adjacent to fixed residues (for example Dostmann WR et al. 1999. 
Pharmacol Ther 82:373-387; Tegge W et al. 1995. Biochemistry 34:10569- 
10577; Tegge WJetal. 1998. Methods Mol Biol 87:99-106). Thus, the current 
method incorporates new flexibility relative to the prior art of "systematic amino 
acid variation on template substrate" by placing a query position at any position 

25 relative to the anchor and phosphorylatable positions. 

Any amino acid can be selected for placement at the query position. 
While in some embodiments all available amino acids are systematically placed 
and tested in the query position, in other embodiments only a subset of natural 
amino acids are selected for placement in the query position. Hence, in some 

30 embodiments, the test set of peptides would include one peptide for each natural 
amino acid. In other embodiments, cysteine is eliminated and only nineteen 
alternative amino acid residues are used. 

In other embodiments, economy is achieved by assuming that amino 
acids can be subdivided into classes that are most similar in their functional 
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properties. For example, using this strategy, a "reduced sef ' of only about 
thirteen amino acid residues are alternatively placed in the query position, as 
illustrated by FIG. 2 and FIG. 6. For example, one of skill in the art may choose 
to eliminate glutamic acid (E) by virtue of its similarity to aspartic acid (D); 
5 isoleucine (I), methionine (M) and valine (V) can be eliminated by virtue of their 
similarity to leucine (L) and tyrosine (Y) can be eliminated by virtue of 
similarity to phenylalanine (F) (see further details in Example 2). 
Choosing Residues and Conditions for Degenerate Positions 

The degenerate amino acid position in the peptide pools can be created 
1 0 such that any one of the twenty amino acids can occupy that position. However, 
this strategy can be altered by one of skill in the art to suit the needs of a 
particular test or situation. For example, one of skill in the art may elect not to 
use cysteine because can give rise to disulfide bonds and dimer formation. 

In other embodiments, residues that may be phosphorylated (e.g. S, T, 
1 5 and Y) can be excluded from the degenerate positions. However, serine, 

threonine and tyrosine residues may also be included because they can have a 
role in determining substrate specificity and because an experimental design 
minimizes noise when such residues are used in degenerate position. For 
example, in the methods of the invention noise from degenerate position serine, 
20 threonine or tyrosine residues is minimized because of the abundance of the 
selected serine, threonine, or tyrosine residue at the P0 position relative to the 
rarity of these amino acids in degenerate positions. Moreover, phosphorylation 
at the P0 position is selectively enhanced by the anchor residues that guide the 
kinase to phosphorylate the appropriate residue. Hence, the types and positions 
25 of degenerate residues can be varied as needed. 

Two approaches can be used for inserting a degenerate set of amino acids 
into selected positions of a peptide. In one embodiment, a mixture of selected 
amino acid residues is added by a specific coupling step to create a degenerate 
position. However, different amino acid residues have different coupling 
30 efficiencies and therefore, if equal amounts of each amino acid are used, each 
amino acid residue may not be equivalency represented at the degenerate 
position. The different coupling efficiencies of different amino acids can be 
compensated for by using a "weighted" mixture of amino acids at a coupling 
step, wherein amino acids with lower coupling efficiencies are present in greater 
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abundance than amino acids with higher coupling efficiencies. Conditions of the 
coupling can also be varied to facilitate achievement of a desired mix in the 
synthesized peptide. For example relatively low molar ratios minimi ze skewing 
by different coupling efficiencies; also, repetitive additions of low molar ratios 
5 can augment efficiency while minimi zin g skewing. 

In an alternative embodiment, the resin upon which the peptides are 
synthesized is divided into equivalent portions and then each portion is subjected 
to a separate coupling reaction that employs a distinct type of amino acid. After 
this coupling reaction, the resin aliquots are recombined and the procedure is 
10 repeated for each degenerate position. This approach results in approximately 
equivalent representation of each different amino acid residue at the degenerate 
position. 

The abundance of residues at the degenerate positions in the peptides can 
be controlled by a variety of different strategies (see FIG. 14). One procedure 

1 5 for controlling the abundance of residues at the degenerate position is shown as 
plan 1 in FIG. 14, where an equal abundance of each amino acid residue is 
selected for each position. However, in many embodiments the abundance of 
amino acids is based on prior knowledge of the abundance of residues in human 
proteins or relevant regions thereof. One such embodiment utilized the average 

20 abundance of various amino acids in the human proteome. The abundance of 
amino acids in human proteins was determined by reference to sequences 
tabulated by the National Center for Biotechnology Information (Plan 2, FIG. 
14). 

In another embodiment, the abundance of various amino acids at a 
25 degenerate position correlates with the abundance of that amino acid in known 
kinase substrates (Plan 3, FIG. 14). Plan 3 of FIG. 14 takes into account the 
physiological relevance of various residues and resembles the residue abundance 
found in physiologic substrates for the kinase(s). To this end, the inventor has 
accumulated a list of known or suspected substrate sites for PKC and has 
30 determined the residue frequency in the regions surrounding those sites (Plan 3, 
FIG. 14). The intent was to create a method that screens the most relevant 
peptide sequences for targeted biological processes. 

Hence, in some embodiments a degenerate mixture of residues is used 
that is like the types of amino acid residues thought to be most relevant to a 
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particular kinase. Implementing this improvement by deviating from equal 
abundance is not a problem in the present method but could be a problem in 
prior art approaches (e.g. U.S. Patent 6,004,757 to Cantley) because prior art 
approaches depend on detection of substrate residue by sequence analysis of the 
5 phosphorylated product and a low abundance of a particular residue in the 
degenerate peptide pool being phosphorylated would decrease the reliability of 
detecting such a difference. 
Additional residues beyond the core peptide 

The peptide pools in a test set or in a superset can include additional 
10 residues at either the N-terminus or C-terminus (or both). Such additional amino 
acid residues may provide additional attachment points or other functions useful 
to one of skill in the art. For example, in the ninety peptide test set having the 
formula Rxx-S-F (FIG. 2), each peptide included a three residue N-terminal 
linker of biotinylated lysine, dansylated lysine and glycine. The biotin moiety 
1 5 provided an efficient mechanism for capture of the peptide before, during or 
after an assay. The dansyl moiety also provided a convenient means to quantify 
the amount of each peptide by measuring light absorption at 335 nm. The 
glycine provided flexibility in connecting the linker to the remainder of the 
peptide. Hence, such linkers can be used in the methods, articles and kits of the 
20 invention. 

Examples of other variations in tests sets of peptides 

The number of peptide pools in a test set can vary. In some 
embodiments, the number of peptide pools in the test set is equivalent to the 
number of amino acids tested at the query position. Hence, for example, if all 
25 twenty naturally-occurring amino acids are tested in the test set, the number of 
peptide pools would be twenty. However, in many embodiments, fewer than 
twenty amino acids are tested because one of skill in the art may have 
information indicating that certain amino acids need not be tested. Moreover, 
many amino acid analogs are available to one of skill in the art and in some 
30 instances the skilled artisan may choose to test such an amino acid analog at the 
query position. In such instances, amino acid analogs may be used in the test 
sets of the invention and the number of peptide pools can be greater than twenty. 
Also, under special circumstances it is useful to use a mixture of amino acids, 
such as (R + K) or (D + E) instead of a single amino acid at a query position. 
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Similarly, special circumstances may dictate use of a limited mix of amino acids 
at the phosphorylatable position (such as S + T), or at an anchor position (such 
as I + L + M + V). Note that FIG. 2 illustrates that the same degenerate peptide 
can be used in three different sets: for example, the peptide symbolized by 
5 c ddddRdd-S-Fdd' (shaded) was an element of the P-3 set, the P-0 set, and the 
P+l set. 

The number of test sets in a superset or collection of peptide pools can 
also vary. In general a superset has at least two test sets of peptide pools. 
Typically the number of test sets corresponds to the number of positions around 

10 the phosphorylation site that are being tested, which is usually in the range of 
from about five to about twenty positions (or test sets). Moreover, a given test 
set can be used as part of different supersets. Also, practical considerations such 
as number of wells in a standardized plate (e.g. 96 or 384) often contribute to the 
choices made regarding number peptide pools in a test set, and number of test 

1 5 sets in a superset Moreover, different test sets can be used as part of different 
supersets. 

The length of a peptide in a peptide pool can also vary. For example, 
although the amino acid sequences described in this application are often about 
five to about fifteen amino acids in length, a peptide that is shorter than five 

20 amino acids may be used in some embodiments. For example, a peptide as short 
as about three amino acids in length may be used as a substrate. The upper size 
of the peptides used in the test sets and supersets is not critical and can vary as 
desired by one of skill in the art. However, peptides that are chemically 
synthesized become more expensive as their length increases. Hence, one of 

25 skill in the art may choose to limit the size of the peptides employed to about 
100 or fewer amino acids, or about 50 or fewer amino acids, or about 30 or 
fewer amino acids, or about 25 or fewer amino acids. 

In some embodiments the peptide pools used in the test sets and supersets 
of the invention are soluble pools of peptides. The term "soluble peptide pools" 

30 is intended to mean a population of peptides that are not attached to a solid 
support at the time they are subjected to phosphorylation. 

In alternative embodiments, the peptides used in the test sets and 
supersets of the invention can be attached to a solid support such as a bead, a 
well of a microtiter dish, a membrane or a plastic pin. For general descriptions of 
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the construction of solid-support bound peptide libraries see for example 
Geysen, H. M., et al. (1986) Mol. Immunol. 23:709-715; Lam, K. S, et al. 
(1991) Nature 354:82-84; and Pinilla, C, et al. (1992) BioTechniques 13:901- 
905. For this type of library, the peptides can be synthesized while attached to a 
5 solid support such as a bead, and degenerate positions are created by splitting the 
population of beads, coupling different amino acids to different subpopulations 
and recombining the beads. The final product is a population of beads each 
carrying many copies of a single unique peptide. This approach has been termed 
"one bead/one peptide". 
10 The choice of a soluble versus immobilized format should not be based 

solely on convenience of the assay; some studies conducted by the inventors 
suggest that significant differences in specificity are observed with the same 
peptides assayed in solution versus assays performed on immobilized peptides. 
Therefore, the distinction between soluble and immobilized may be of 
1 5 considerable importance. The use of soluble peptide pools as the preferred 
embodiment of this invention distinguishes the invention from many prior 
methods performed with immobilized peptides. Also, those of skill in the art 
should carefully assess all the implications of these alternative formats when 
choosing the design of test sets of peptides for particular applications. 
20 The peptides utilized in the test sets and supersets of the invention can be 

prepared by any method available to one of skill in the art. For example, the 
peptides can be constructed by in vitro chlemical synthesis, for example using an 
automated peptide synthesizer. As described herein the peptides can be soluble 
peptide pools or the peptides can be attached to a solid support such as a bead, 
25 membrane, microtiter well, tube or other convenient solid support. 

Standard techniques for in vitro chemical synthesis of peptides are 
known in the art. For example, peptides can be synthesized by 
(benzotriazolyloxy)tris (dimethylamino)-phosphonium hexafluorophosophate 
(BOP)/l-hydroxybenzotriazole coupling protocols. Automated peptide 
30 synthesizers are commercially available (e.g., Milligen /Biosearch 9600). For 
general descriptions of the construction of soluble synthetic peptide libraries see 
for example Houghten, R. A., et al., (1991) Nature 354:84-86 and Houghten, R. 
A., et al, (1992) BioTechniques 13:412-421. 
Analysis of kinase specificity with non-degenerate peptides 
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Although degenerate peptides are particularly useful for studying kinase 
peptide specificity, strategic use of non-degenerate peptides can also be effective 
for identifying new substrates (Tables 3, 4, 5, 9). The present invention also 
teaches strategic design of sets of single sequence peptides (i.e. no degenerate 
5 positions) so that they can be used for elucidating kinase peptide specificity of 
basophilic kinases (Example 13 and Example 14). 
Binding Entities that Bind to Substrates of Kinases 

The invention also contemplates binding entities that can bind to peptides 
or proteins that may be phosphorylated by a kinase. In some embodiments, the 

10 binding entities bind to the non-phosphorylated substrate; in other embodiments 
the binding entities bind to phosphorylated substrates. 

For example, as illustrated herein, a site-specific phospho-antibody was 
generated and used to detect phosphorylation at a specific peptidyl sequence. A 
phospho-peptide having sequence CDKEKSKG-(pS)-LKRK-OH (SEQ ID 

15 NO: 570) was made. This sequence (without phosphorylation) comprises the C- 
terminus of SHP-1 and was chosen for study because the methods of the current 
invention predicted that it was a candidate site for phosphorylation by PKC (see 
Example 10). This phospho-peptide includes a sequence that corresponds to the 
C-teiminus of SHP-1 but, in addition, it has an N-terminal cysteine useful for 

20 coupling to a carrier. The corresponding non-phosphorylated peptide was also 
synthesized for use as a control. The phospho-peptide (SEQ ID NO: 570) was 
coupled onto a KLH earner, rabbits were immunized, and anti-sera samples were 
screened for reactivity with the SEQ ID NO:570 phospho-peptide by ELISA 
assay. Antibodies reactive with corresponding non-phosphorylated peptide were 

25 removed from anti-sera by passing the anti-sera through a column having the 

non-phosphorylated peptide bound to the column matrix. Finally, anti-sera were 
enriched for phospho-specific reactivity by use of an affinity column made from 
the phospho-peptide. The antibody preparation so produced was called the anti- 
pS591 antibody preparation. 

30 The specificity of the antibody for SHP-1 pS591 was confirmed by 

Western blot analysis (see FIG. 43). When the anti-SHP-1 pS591 antibody was 
used at a dilution of 1 : 15,000, only a single strong band was detected on a 
Western blot of a lysate of Jurkat cells. The position of this band was 
characteristic of SHP-1. In contrast, in similar experiments, an antibody that 
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binds generally to sites phosphorylated by PKC bound to many bands. This 
antibody facilitated studies of the functional importance of phosphorylation of 
this site in SHP-1 (see Example 10). 

Thus the invention provides binding entities that can selectively bind to 
5 sites that are phosphorylated by various kinases. In other embodiments, the 
' binding entities selectively bind to non-phosphorylated sites that normally are 
recognized by kinases. Such binding entities can be used in vitro or in vivo for 
detecting phosphorylated or non-phosphorylated peptides or proteins or for 
modulating the function of a phosphorylated or non-phosphoiylated protein. As 

10 used herein, a binding entity is any small molecule, peptide, or polypeptide that * 
can bind to a peptidyl substrate site of kinase. In some embodiments, the 
binding entities are antibodies. 

Hence, binding entities can bind to a phosphorylated peptidyl substrate 
sequence but exhibit significantly less or substantially no binding to the 

15 corresponding non-phosphorylated peptidyl substrate sequence. Binding entities 
of the invention can also bind to a non-phosphorylated peptidyl substrate 
sequence but exhibit significantly less or substantially no binding to the 
corresponding phosphorylated peptidyl substrate sequence. 

For example, binding entities and antibodies contemplated by the 

20 invention may bind to a peptide having a combination of SEQ ID NO:76, 8 1 , 82, 
87, 89-92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 121, 124, 127-129, 131- 
134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173-177, 179, 182-192, 
196-206, 208-21 1, 213-216, 474-517 or 570. In another embodiment, binding 
entities and antibodies of the invention bind to a peptide having SEQ ID NO:76, 

25 81, 82, 87, 89-92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 121, 124, 127- 
129, 131-134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173-177, 179, 
182-192, 196-206, 208-21 1, 213-216, 474-517, or 570, but not any other of the 
peptides. In further embodiments of the invention, binding entities and 
antibodies of the invention bind to a phosphorylated peptide having one of SEQ 

30 IDNO:76,81,82,87, 89-92,94,97-99, 102, 104, 105, 108, 110, 112, 113, 121, 
124, 127-129, 131-134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173- 
177, 179, 182-192, 196-206, 208-211, 213-216, 474-517 or 570, but exhibit 
significantly less or substantially no binding to the corresponding non- 
phosphorylated peptidyl substrate sequence. 
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In still further embodiments of the invention, binding entities and 
antibodies of the invention bind to a non-phosphorylated peptide having one of 
SEQ ID NO:76, 81, 82, 87, 89-92, 94, 97-99, 102, 104, 105, 108, 1 10, 1 12, 1 13, 
121, 124, 127-129, 131-134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 
5 173-177, 179, 182-192, 196-206, 208-21 1, 213-216, 474-517 or 570, but exhibit 
significantly less or substantially no binding to the corresponding 
phosphorylated peptidyl substrate sequence. 

In some embodiments, the binding entities recognize phosphorylated or 
non-phosphorylated peptidyl sequences having any one of SEQ ID NO: 89, 102, 
10 1 10, 1 12, 127, 177, 1 82, 209, 474-488 or 489. In other embodiments, the 
binding entities recognize phosphorylated or non-phosphorylated peptidyl 
sequences having any one of SEQ ID NO: 173, 185, 192, 196, 200, 490-491 or 
492. 

In further embodiments, the binding entities further differentiate between 
15 a phosphorylated peptide having any one of SEQ ID NO: 298, 301-324,326-347, 
349-400, 402-410, 412-473, 571-643 or 644, and a non-phosphorylated peptide 
that differs from the phosphorylated peptide by substitution of Ser for the pSer 
or substitution of a Thr for the pThr. For example, such a phosphorylated 
peptide can have any one of SEQ ID: 298, 320, 324, 350, 351, 366, 388, 394, 
20 398, 402, 418, 464, 571-595 or 596. In other embodiments, the phosphorylated 
peptide can have any one of SEQ ID: 301, 310, 317, 322, 344, 352, 371, 406, 
597-599 or 600. One example of a preferred binding entity of the invention is a 
binding entity that binds to a phosphorylated peptide that includes SEQ ID 
NO:298. Another example of a preferred binding entity of the invention is a 
25 binding entity that binds to a phosphorylated peptide that; includes SEQ ID 

NO:3 13 or 3 14. Another example of a preferred binding entity of the invention 
is a binding entity that binds to a phosphorylated peptide that includes SEQ ID 
NO:361 or 362. 

The invention provides antibodies and binding entities made by available 
30 procedures that can bind a non-phosphorylated peptide or phosphorylated 

peptide of the invention. The binding domains of such antibodies, for example, 
the CDR regions of these antibodies, can also be transferred into or utilized with 
any convenient binding entity backbone. 
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Antibody molecules belong to a family of plasma proteins called 
immunoglobulins, whose basic building block, the immunoglobulin fold or 
domain, is used in various forms in many molecules of the immune system and 
other biological recognition systems. A standard antibody is a tetrameric 
5 structure consisting of two identical immunoglobulin heavy chains and two 
identical light chains and has a molecular weight of about 150,000 daltons. 

The heavy and light chains of an antibody consist of different domains. 
Each light chain has one variable domain (VL) and one constant domain (CL), 
while each heavy chain has one variable domain (VH) and three or four constant 

10 domains (CH). See, e.g., Alzari, P. N., Lascombe, M.-B. & Poljak, R. J. (1988) 
Three-dimensional structure of antibodies. Annu. Rev. Immunol. 6, 555-580. 
Each domain, consisting of about 110 amino acid residues, is folded into a 
characteristic P-sandwich structure formed from two Ji-sheets packed against 
each other, the immunoglobulin fold. The VH and VL domains each have three 

1 5 complementarity determining regions (CDR1 -3) that are loops, or turns, 

connecting p-strands at one end of the domains. The variable regions of both the 
light and heavy chains generally contribute to antigen specificity, although the 
contribution of the individual chains to specificity is not always equal. Antibody 
molecules have evolved to bind to a large number of molecules by using six 

20 randomized loops (CDRs). 

Immunoglobulins can be assigned to different classes depending on the 
amino acid sequences of the constant domain of their heavy chains. There are at 
least five (5) major classes of immunoglobulins: IgA, IgD, IgE, IgG and IgM. 
Several of these may be further divided into subclasses (isotypes), for example, 

25 IgG-1, IgG-2, IgG-3 and IgG-4; IgA-1 and IgA-2. The heavy chain constant 
domains that correspond to the IgA, IgD, IgE, IgG and IgM classes of 
immunoglobulins are called alpha (a), delta (5), epsilon (e), gamma (y) and mu 

respectively. The light chains of antibodies can be assigned to one of two 
clearly distinct types, called kappa (k) and lambda (X), based on the amino 

30 sequences of their constant domain. The subunit structures and three- 
dimensional configurations of different classes of immunoglobulins are well 
known. 

The term "variable" in the context of variable domain of antibodies, 
refers to the fact that certain portions of variable domains differ extensively in 
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sequence from one antibody to the next. The variable domains are for binding 
and determine the specificity of each particular antibody for its particular 
antigen. However, the variability is not evenly distributed through the variable 
domains of antibodies. Instead, the variability is concentrated in three segments 
5 . called complementarity determining regions (CDRs), also known as 

hypervariable regions in both the light chain and the heavy chain variable 
domains. 

The more highly conserved portions of variable domains are called 
framework (FR) regions. The variable domains of native heavy and light chains 

10 each comprise four FR regions, largely adopting a (3-sheet configuration, 
connected by three CDRs, which form loops connecting, and in some cases 
forming part of, the (3-sheet structure. The CDRs in each chain are held together 
in close proximity by the FR regions and, with the CDRs from another chain, 
contribute to the formation of the antigen-binding site of antibodies. The 

1 5 constant domains are not involved directly in binding an antibody to an antigen, 
but exhibit various effector functions, such as participation of the antibody in 

v antibody-dependent cellular toxicity. 

An antibody that is contemplated for use in the present invention thus can 
be in any of a variety of forms, including a whole immunoglobulin, an antibody 

20 fragment such as Fv, Fab, and similar fragments, a single chain antibody which 
includes the variable domain complementarity detennining regions (CDR), and 
the like forms, all of which fall under the broad term "antibody", as used herein. 
The present invention contemplates the use of any specificity of an antibody, 
polyclonal or monoclonal, and is not limited to antibodies that recognize and 

25 immunoreact with a specific peptide sequence described herein or a derivative 
thereof. 

Moreover, the binding regions, or CDR, of antibodies can be placed 
within the backbone of any convenient binding entity polypeptide. In preferred 
embodiments, in the context of methods described herein, an antibody, binding 
30 entity or fragment thereof is used that is immunospecific for any of the peptides 
described herein, as well as the derivatives thereof, including the.phosphorylated 
derivatives thereof. 

The term "antibody fragment" refers to a portion of a full-length 
antibody, generally the antigen binding or variable region. Examples of antibody 
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fragments include Fab, Fab', F(ab') 2 and Fv fragments. Papain digestion of 
antibodies produces two identical antigen binding fragments, called Fab 
fragments, each with a single antigen binding site, and a residual Fc fragment. 
Fab fragments thus have an intact light chain and a portion of one heavy chain. 
5 Pepsin treatment yields an F(ab') 2 fragment that has two antigen binding 

fragments that are capable of cross-linking antigen, and a residual fragment that 
is termed a pFc' fragment. Fab' fragments are obtained after reduction of a 
pepsin digested antibody, and consist of an intact light chain and a portion of the 
heavy chain. Two Fab' fragments are obtained per antibody molecule. Fab' 

10 fragments differ from Fab fragments by the addition of a few residues at the 
carboxyl terminus of the heavy chain CHI domain including one or more 
cysteines from the antibody hinge region. 

Fv is the minimum antibody fragment that contains a complete antigen 
recognition and binding site. This region consists of a dimer of one heavy and 

1 5 one light chain variable domain in a tight, non-covalent association (V H -V l 
dimer). It is in this configuration that the three CDRs of each variable domain 
interact to define an antigen binding site on the surface of the Vh -V l dimer. 
Collectively, the six CDRs confer antigen binding specificity to the antibody. 
However, even a single variable domain (or half of an Fv comprising only three 

20 CDRs specific for an antigen) has the ability to recognize and bind antigen, 
although at a lower affinity than the entire binding site. As used herein, 
"functional fragment" with respect to antibodies, refers to Fv, F(ab) and F(ab') 2 
fragments. 

Additional fragments can include diabodies, linear antibodies, single- 
25 chain antibody molecules, and multispecific antibodies formed from antibody 
fragments. Single chain antibodies are genetically engineered molecules 
containing the variable region of the light chain, the variable region of the heavy 
chain, linked by a suitable polypeptide linker as a genetically fused single chain 
molecule. Such single chain antibodies are also referred to as "single-chain Fv" 
30 or "sFv" antibody fragments. Generally, the Fv polypeptide further comprises a 
polypeptide linker between the VH and VL domains that enables the sFv to form 
the desired structure for antigen binding. For a review of sFv see Pluckthun in 
The Pharmacology of Monoclonal Antibodies, vol. 1 13, Rosenburg and Moore 
eds. Springer-Verlag, N.Y., pp. 269-315 (1994). 
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The term "diabodies" refers to a small antibody fragments with two 
antigen-binding sites, where the fragments comprise a heavy chain variable 
domain (VH) connected to a light chain variable domain (VL) in the same 
polypeptide chain (VH-VL). By using a linker that is too short to allow pairing 
5 between the two domains on the same chain, the domains are forced to pair with 
the complementary domains of another chain and create two antigen-binding 
sites. Diabodies are described more fully in, for example, EP 404,097; WO 
93/1 1161, and Hollinger et al, Proc. Natl. Acad Sci. USA 90: 6444-6448 (1993). 
Antibody fragments contemplated by the invention are therefore not full- 

10 length antibodies. However, such antibody fragments can have similar or 
improved immunological properties relative to a full-length antibody. Such 
antibody fragments may be as small as about 4 amino acids, 5 amino acids, 6 
amino acids, 7 amino acids, 9 amino acids, about 12 amino acids, about 15 
amino acids, about 17 amino acids, about 18 amino acids, about 20 amino acids, 

15 about 25 amino acids, about 30 amino acids or more. 

In general, an antibody fragment of the invention can have any upper size 
limit so long as it is has similar or improved immunological properties relative to 
an antibody that binds with specificity to a peptide or phosphorylated peptide 
described herein. For example, smaller binding entities and light chain antibody 

20 fragments can have less than about 200 amino acids, less than about 175 amino 
acids, less than about 150 amino acids, or less than about 120 amino acids if the 
antibody fragment is related to a light chain antibody subunit Moreover, larger 
binding entities and heavy chain antibody fragments can have less than about 
425 amino acids, less than about 400 amino acids, less than about 375 amino 

25 acids, less than about 350 amino acids, less than about 325 amino acids or less 
than about 300 amino acids if the antibody fragment is related to a heavy chain 
antibody subunit. 

Antibodies directed against disease markers can be made by any 
available procedure. Methods for the preparation of polyclonal antibodies are 

30 available to those skilled in the art. See, for example, Green, et al., Production 
of Polyclonal Antisera, in: Immunochemical Protocols (Manson, ed), pages 1-5 
(Humana Press); Coligan, et al., Production of Polyclonal Antisera in Rabbits, 
Rats Mice and Hamsters, in: Current Protocols in Tmirmnnlnp^ section 2 4 1 
(1992), which are hereby incorporated by reference. 
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Monoclonal antibodies can also be employed in the invention. The term 
"monoclonal antibody" as used herein refers to an antibody obtained from a 
population of substantially homogeneous antibodies. In other words, the 
individual antibodies comprising the population are identical except for 
occasional naturally occurring mutations in some antibodies that may be present 
in minor amounts. Monoclonal antibodies are highly specific, being directed 
against a single antigenic site. Furthermore, in contrast to polyclonal antibody 
preparations that typically include different antibodies directed against different 
determinants (epitopes), each monoclonal antibody is directed against a single 
determinant on the antigen. In additional to their specificity, the monoclonal 
antibodies are advantageous in that they are synthesized by the hybridoma 
culture, uncontaminated by other immunoglobulins. The modifier "monoclonal" 
indicates the character of the antibody indicates the character of the antibody as 
being obtained from a substantially homogeneous population of antibodies, and 
is not to be construed as requiring production of the antibody by any particular 
method. 

The monoclonal antibodies herein specifically include "chimeric" 
antibodies in which a portion of the heavy and/or light chain is identical or 
homologous to corresponding sequences in antibodies derived from a particular 
species or belonging to a particular antibody class or subclass, while the 
remainder of the chain(s) is identical or homologous to corresponding sequences 
in antibodies derived from another species or belonging to another antibody class 
or subclass. Fragments of such antibodies can also be used, so long as they 
exhibit the desired biological activity. See U.S. Patent No. 4,8 1 6,567; Morrison 
et al. Proc. Natl. Acad Sci. 81, 6851-55 (1984). The monoclonal antibodies 
herein also specifically include those made from different animal species, 
including mouse, rat, human and rabbit. 

The preparation of monoclonal antibodies likewise is conventional. See, 
for example, Kohler & Milstein, Nature, 256:495 (1975); Coligan, et al., sections 
2.5.1-2.6.7; and Harlow, et al., in: Antibodies: A Laboratory Manual , page 726 
(Cold Spring Harbor Pub. (1988)), which are hereby incorporated by reference. 
Monoclonal antibodies can be isolated and purified from hybridoma cultures by 
a variety of well-established techniques. Such isolation techniques include 
affinity chromatography with Protein-A Sepharose, size-exclusion 
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chromatography, and ion-exchange chromatography. See, e.g., Coligan, et al., 
sections 2.7.1-2.7.12 and sections 2.9.1-2.9.3; Barnes, et al., Purification of 
Immunoglobulin G (IgG), in: Methods in Molecular Biology. Vol. 10, pages 79- 
104 (Humana Press (1992). 
5 Methods of in vitro and in vivo manipulation of antibodies are available 

to those skilled in the art. For example, the monoclonal antibodies to be used in 
accordance with the present invention may be made by the hybridoma method as 
described above or may be made by recombinant methods, e.g., as described in 
U.S. Pat. No. 4,816,567. Monoclonal antibodies for use with the present 
1 0 invention may also be isolated from phage antibody libraries using the 

techniques described in Clackson et al. Nature 352: 624-628 (1991), as well as in 
Marks et al., J. Mol Biol. 222: 581-597 (1991). 

Methods of making antibody fragments are also known in the art (see for 
example, Harlow and Lane, Antibodies: A Laboratory Manual . Cold Spring 

15 Harbor Laboratory, New York, (1988), incorporated herein by reference). 
Antibody fragments of the present invention can be prepared by proteolytic 
hydrolysis of Ihe antibody or by expression of nucleic acids encoding the 
antibody fragment in a suitable host. Antibody fragments can be obtained by 
pepsin or papain digestion of whole antibodies conventional methods. For 

20 example, antibody fragments can be produced by enzymatic cleavage of 
antibodies with pepsin to provide a 5S fragment described as F(ab , ) 2 . This 
fragment can be further cleaved using a thiol reducing agent, and optionally 
using a blocking group for the sulfhydryl groups resulting from cleavage of 
disulfide linkages, to produce 3.5S Fab 5 monovalent fragments. Alternatively, 

25 enzymatic cleavage using pepsin produces two monovalent Fab* fragments and 
an Fc fragment directly. These methods are described, for example, in U.S. 
Patents No. 4,036,945 and No. 4,331,647, and references contained therein. 
These patents are hereby incorporated by reference in their entireties. 

Other methods of cleaving antibodies, such as separation of heavy chains 

30 to form monovalent light-heavy chain fragments, further cleavage of fragments, 
or other enzymatic, chemical, or genetic techniques may also be used, so long as 
the fragments bind to the antigen that is recognized by the intact antibody. For 
example, Fv fragments comprise an association of V H and V L chains. This 
association may be noncovalent or the variable chains can be linked by an 
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15 



intermolecular disulfide bond or cross-linked by chemicals such as 
glutaraldehyde. Preferably, the Fv fragments comprise V H and V L chains 
connected by a peptide linker. These single-chain antigen binding proteins (sFv) 
are prepared by constructing a structural gene comprising DNA sequences 
5 encoding 1he V H and V L domains connected by an oligonucleotide. The 
structural gene is inserted into an expression vector, which is subsequently 
introduced into a host cell such as E. coli. The recombinant host cells synthesize 
a single polypeptide chain with a linker peptide bridging the two V domains. 
Methods for producing sFvs are described, for example, by Whitlow, et al., 
10 Methods: a Companion t o Methods in Enzvmolo|gv. Vol 2, page 97 (1991); 
Bird, et al., Science 242:423-426 (1988); Ladner, et al, US Patent No. 4,946,778; 
and Pack, et al., Bio/Technologv 1 1:1271-77 (1993). 

Another form of an antibody fragment is a peptide coding for a single 
complementarity-determining region (CDR). CDR peptides ("minimal 
recognition units") are often involved in antigen recognition and binding. CDR 
peptides can be obtained by cloning or constructing genes encoding the CDR of 
an antibody of interest. Such genes are prepared, for example, by using the 
polymerase chain reaction to synthesize the variable region from RNA of 
antibody-producing cells. See, for example, Larrick, et al., Methods: a 
20 C ompanion to Methods in Enzvmology . Vol. 2, page 106 (1991). 

The invention contemplates human and humanized forms of non-human 
(e.g. murine>antibodies. Such Inunanized antibodies are chimeric 
immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, 
Fab', F(ab')2 or other antigen-binding subsequences of antibodies) that contain 
25 minimal sequence derived from non-human immunoglobulin. For the most part, 
humanized antibodies are human immunoglobulins (recipient antibody) in which 
residues from a complementary determining region (CDR) of the recipient are 
replaced by residues from a CDR of a nonhuman species (donor antibody) such 
as mouse, rat or rabbit having the desired specificity, affinity and capacity. 
30 In some instances, Fv framework residues of the human immunoglobulin 

are replaced by corresponding non-human residues. Furthermore, humanized 
antibodies may comprise residues that are found neither in the recipient antibody 
nor in the imported CDR or framework sequences. These modifications are 
made to further refine and optimize antibody performance. In general, 
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humanized antibodies will comprise substantially all of at least one, and 
typically two, variable domains, in which all or substantially all of the CDR 
regions correspond to those of a non-human immunoglobulin and all or 
substantially all of the FR regions are those of a human immunoglobulin 
5 consensus sequence. The humanized antibody optimally also will comprise at 
least a portion of an immunoglobulin constant region (Fc), typically that of a 
human immunoglobulin. For further details, see: Jones et al., Nature 321, 522- 
525 (1986); Reichmann et al., Nature 332, 323-329 (1988); Presta, Curr. Op. 
Struct. Biol. 2, 593-596 (1992); Holmes, et al., J. Immunol., 158:2192-2201 
10 (1997) and Vaswani, et al., Annals Allergy, Asthma & Immunol, 81: 105-1 15 
(1998). 

While standardized procedures are available to generate antibodies, the 
size of antibodies, the multi-stranded structure of antibodies and the complexity 
of six binding loops present in antibodies constitute a hurdle to the improvement 

1 5 and the manufacture of large quantities of antibodies. Hence, the invention 
further contemplates using binding entities, which comprise polypeptides that 
can recognize and bind to kinase substrates provided herein. 

A number of proteins can serve as protein scaffolds to which binding 
domains can be attached and thereby form a suitable binding entity. The binding 

20 domains bind or interact with the peptide sequences of the invention while the 
protein scaffold merely holds and stabilizes the binding domains so that they can 
bind. A number of protein scaffolds can be used. For example, phage capsid 
proteins can be used. See Review in Clackson & Wells, Trends Biotechnol. 
12: 173-184 (1994). Phage capsid proteins have been used as scaffolds for 

25 displaying random peptide sequences, including bovine pancreatic trypsin 

inhibitor (Roberts et al., PNAS 89:2429-2433 (1992)), human growth hormone 
(Lowman et al., Biochemistry 30:10832-10838 (1991)), Venturini et al., Protein 
Peptide Letters 1:70-75 (1994)), and the IgG binding domain of Streptococcus 
(OWeil et al., Techniques in Protein Chemistry V (Crabb, L,. ed.) pp. 517-524, 

30 Academic Press, San Diego (1 994)). These scaffolds have displayed a single 
randomized loop or region that can be modified to include binding domains for 
kinase substrates. 

Researchers have also used the small 74 amino acid a-amylase inhibitor 
Tendamistat as a presentation scaffold on the filamentous phage Ml 3. McConnell, S. 

i 
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J., & Hoess, R. H., IMol. Biol. 250:460-470 (1995). Tendamistat is a 0-sheet protein 
from Streptomyces tendae. It has a number of features that make it an attractive 
scaffold for binding entities, including its small size, stability, and the availability of 
high resolution NMR and X-ray structural data. The overall topology of Tendamistat is 
5 similar to that of an immunoglobulin domain, with two P-sheets connected by a series 
of loops. In contrast to immunoglobulin domains, the P-sheets of Tendamistat are held 
together with two rather than one disulfide bond, accounting for the considerable 
stability of the protein. The loops of Tendamistat can serve a similar function to the 
CDR loops found in immunoglobulins and can be easily randomized by in vitro 
10 mutagenesis. Tendamistat is derived from Streptomyces tendae and may be antigenic 
in humans. Hence, binding entities that employ Tendamistat are preferably employed 
in vitro. 

Fibronectin type HI domain has also been used as a protein scaffold to which 
binding entities can be attached. Fibronectin type HI is part of a large subfamily (Fn3 

1 5 family or s-type Ig family) of the immunoglobulin superfamily. Sequences, vectors 
and cloning procedures for using such a fibronectin type HI domain as a protein 
scaffold for binding entities (e.g. CDR peptides) are provided, for example, in U.S. 
Patent Application Publication 20020019517. See also, Bork, P. & Doolittle, It! F. 
(1992) Proposed acquisition of an animal protein domain by bacteria. Proc. Natl. Acad. 

20 Sci. USA 89, 8990-8994; Jones, E. Y. (1993) The immunoglobulin superfamily Cuit. 
Opinion Struct. Biol. 3, 846-852; Boric, P., Horn, L. & Sander, C. (1994) The 
immunoglobulin fold. Structural classification, sequence patterns and common core. J. 
Mol. Biol. 242, 309-320; Campbell, I. D. & Spitzfaden, C. (1994) Building proteins 
with fibronectin type IE modules Structure 2, 233-337; Harpez, Y. & Chothia, C. 

25 (1994). 

In the immune system, specific antibodies are selected and amplified from a 
large library (affinity maturation). The combinatorial techniques employed in immune 
cells can be mimicked by mutagenesis and generation of combinatorial libraries of 
binding entities. Variant binding entities, antibody fragments and antibodies therefore 
30 can also be generated through display-type technologies. Such display-type 

technologies include, for example, phage display, retroviral display, ribosomal display, 
and other techniques. Techniques available in the art can be used for generating 
libraries of binding entities, for screening those libraries and the selected binding 
entities can be subjected to additional maturation, such as affinity maturation. Wright 
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and Harris, supra., Hanes and Plucthau PNAS USA 94:4937-4942 (1997) (ribosomal 
display), Parmley and Smith Gene 73:305-318 (1988) (phage display), Scott TIBS 
17:241-245 (1992), Cwirla et al. PNAS USA 87:6378-6382 (1990), Russel et al. Nucl. 
Acids Research 21:1081-1085 (1993), Hoganboom et al. Immunol. Reviews 130:43-68 
5 (1992), Chiswell and McCafferty TIBTECH 10:80-84 (1992), and U.S. Pat No. 
5,733,743. 

The invention therefore also provides methods of mutating antibodies, CDRs or 
binding domains to optimize their affinity, selectivity, binding strength and/or other 
desirable properties. A mutant binding domain refers to an amino acid sequence 
10 variant of a selected binding domain (e.g. a CDR). In general, one or more of the 

amino acid residues in the mutant binding domain is different fix>m what is present in 
the reference binding domain. Such mutant antibodies necessarily have less than 100% 
sequence identity or similarity with the reference amino acid sequence. In general, 
mutant binding domains have at least 75% amino acid sequence identity or similarity 

1 5 with the amino acid sequence of the reference binding domain. Preferably, mutant 

binding domains have at least 80%, more preferably at least 85%, even more preferably 
at least 90%, and most preferably at least 95% amino acid sequence identity or 
similarity with the ammo acid sequence of the reference binding domain. 

For example, affinity maturation using phage display can be utilized as one 

20 method for generating mutant binding domains. Affinity maturation using phage 
display refers to a process described in Lowman et al., Biochemistry 30(45): 10832- 
10838 (1991), see also Hawkins et al., J. Mol Biol. 254: 889-896 (1992). While not 
strictly limited to the following description, this process can be described briefly as 
involving mutation of several binding domains or antibody hypervariable regions at a 

25 number of different sites with the goal of generating all possible amino acid 

substitutions at each site. The binding domain mutants thus generated are displayed in a 
monovalent fashion from filamentous phage particles as fusion proteins. Fusions are 
generally made to the gene in product of Ml 3. The phage expressing the various 
mutants can be cycled through several rounds of selection for the trait of interest, e.g. 

30 binding affinity or selectivity. The mutants of interest are isolated and sequenced. 
Such methods are described in more detail in U.S. Patent 5,750,373, U.S. Patent 
6,290,957 and Cunningham, B. C. et al., EMBO J. 13(11), 2508-2515 (1994). 

Therefore, in one embodiment, the invention provides methods of manipulating 
binding entity or antibody polypeptides or the nucleic acids encoding them to generate 
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binding entities, antibodies and antibody fragments with improved binding properties 
that recognize kinase substrate sequences. 

Such methods of mutating portions of an existing binding entity or antibody 
involve fusing a nucleic acid encoding a polypeptide that encodes a binding domain for 
5 a disease marker to a nucleic acid encoding a phage coat protein to generate a 

recombinant nucleic acid encoding a fusion protein, mutating the recombinant nucleic 
acid encoding the fusion protein to generate a mutant nucleic acid encoding a mutant 
fusion protein, expressing the mutant fusion protein on the surface of a phage, and 
selecting phage that bind to a kinase substrate. 
10 Accordingly, the invention provides antibodies, antibody fragments, and 

binding entity polypeptides that can recognize and bind to a kinase substrate (e.g., a 
peptide sequence having any of the peptidyl sequences described herein). The 
invention further provides methods of manipulating those antibodies, antibody 
fragments, and binding entity polypeptides to optimize their binding properties or other 
15 desirable properties (e.g., stability, size, ease of use). 

Such phospho-antibody production is well known to practitioners of the 
art; pertinent descriptions of such approaches include those described in 
Current Protocols in Cell Biology, Chap. 16. Antibodies as Cell 
Biological Tools, unit 16.6 Production of Antibodies That Recognize Specific 
20 Tyrosine-Phosphorylated Peptides. In particular, methods available in the art 
include, purification of binding entities that bind specificity to the 
phosphorylated peptide; depletion of binding entities that cross-react on the non- 
phosphorylated peptide and depletion of binding entities that cross-react on the a 
distinct phosphopeptide. 
25 Kinases that can be used in the Methods of the Invention 

The methods of the invention can be used to identify the specificity of 
any type of wild type or mutant kinase from any prokaryotic or eukaryotic 
species. For example, the kinase can be a protein-serine/tbreonine specific 
kinase (in which case a peptide library or set with a fixed non-degenerate serine 
30 or threonine is used), a protein-tyrosine specific kinase (in which case a peptide 
library or set with a fixed non-degenerate tyrosine is used) or a dual-specificity 
kinase (in which case a peptide library or set with either a fixed non-degenerate 
serine, threonine or tyrosine can be used). Examples of protein kinases that can 



58 



be utilized in the methods of the invention can also be found in Hanks et al. 
(1988) Science 241:42-52 and Manning G et al. 2002. Science 298:1912-1934. 

Protein-serine/threonine specific kinases that can be used in the methods 
of the invention include and of those listed herein as well as: 1) cyclic 
nucleotide-dependent kinases, such as cyclic-AMP-dependent protein kinases 
(e.g., protein kinase A) and cyclic-GMP-dependent protein kinases; 2) calcium- 
phospholipid-dependent kinases, such as protein kinase C; 3) calcium- 
calmodulin-dependent kinases, including CaMH, phosphorylase kinase (PhK), 
myosin light chain kinases (e.g., MLCK-K, MLCK-M), PSK-H1 and PSK-C3; 
4) the SNF1 family of protein kinases (e.g., SNF 1, niml, KIN1 and KIN2); 5) 
casein kinases (e.g., CKH); 6) the Raf-Mos proto-oncogene family of kinases, 
including Raf, A-Raf, PKS and Mos; and 7) the STE7 family of kinases (e.g. 
STE7 and PBS2). Additionally, the protein-serine/threonine specific kinase can 
be a kinase involved in cell cycle control Many kinases involved in cell cycle 
control have been identified. Cell cycle control kinases include the cyclin 
dependent kinases, which are heterodimers of a cyclin and kinase (such as cyclin 
B/p33 cdc2 , cyclin A/p33 CDK2 , cyclin E/p33 CDK2 and cyclin Dl/p33 CDK4 ). Other 
cell cycle control kinases include Weel kinase, Niml/Cdrl kinase, Wisl kinase 
and NIMA kinase. 

Protein-tyrosine specific kinases that can be used in the methods of the 
invention include: 1) members of the src family of kinases, including pp60 c ' src , 
pp60 vsrc , Yes, Fgr, FYN, LYN, LCK, HCK, Dsrc64 and Dsrc28; 2) members of 
the Abl family of kinases, including Abl, ARG, Dash, Nabl and Fes/Fps; 3) 
members of the epidermal growth factor receptor (EGFR) family of kinases, 
including EGFR, v-Erb-B, NEU and DER; 4) members of the insulin receptor 
(INS.R) family of growth factors, including INS.R, IGF1R, DILR, Ros, 71ess, 
TRK and MET; 5) members of the platelet-derived growth factor receptor 
(PDGFR) family of kinases, including PDGFR, CSF1R, Kit and RET. 

Other protein kinases which can be used in the method of the invention 
include syk, ZAP70, Focal Adhesion Kinase, erkl, erk2, erk3, MEK, CSK, 
BTK, ITK, TEC, TEC-2, JAK-1, JAK-2, LET23, c-fins, S6 kinases (including 
p70 S6 and RSKs), TGF-p/activin receptor family kinases and Clk. 
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Kits 

The invention is further directed to a kit having a test set or an array of 
peptide pools for identifying kinase substrate specificities. The peptides used in 
the test sets and arrays can be soluble peptides or peptides attached to a solid 
support Instructions for using the array can also be included in the kit. 

As described above, a test set contains peptide pools, wherein every 
peptide in each of the peptide pools has an amino acid that can be 
phosphorylated by a kinase, a query amino acid, at least one anchor amino acid, 
and at least one degenerate amino acid. The amino acid that can be 
phosphorylated by a kinase is at a defined phosphorylation position and every 
peptide of every peptide pool within a test set of peptide pools has an identical 
amino acid that can be phosphorylated by a kinase in that phosphorylation 
position. The query amino acid is at a defined query position within a test set 
but the query amino acid's identity at that defined query position is 
systematically varied from one peptide pool to the next peptide pool within a test 
set of peptide pools. Each anchor amino acid is at a defined anchor position 
within a test set and an identical anchor amino acid is present at that defined 
position in every peptide of every peptide pool in the test set, but each test set of 
the series of test sets can have different anchor amino acids. The at least one 
degenerate amino acid is an unknown amino acid selected from a degenerate 
mixture of amino acids. 

The methods and kits of the invention can be used to determine an amino 
acid sequence motif for the phosphorylation site of any kinase. The preferred 
embodiment of such kits includes software to facilitate calculation of results, 
determination of derived parameters such as residue preference and scores for a 
position specific scoring matrix, and display of results in informative formats 
such as the PSSM Logo. The kits of the invention can also include any item, 
reagent or solution useful for performing the methods of the invention. Such 
items can include microtiter plates, arrays of peptide pools where the peptides 
are attached to a solid support, tubes for diluting reagents, and the like. Reagents 
useful for performing the methods of the invention include, for example, ATP, y- 
labeled ATP, cations and co-factors typically utilized by kinases. Solutions 
useful for performing the method include buffer solutions for controlling or 



60 



WO 2005/028666 



PCT/US2004/029397 



adjusting the pH of the kinase assay mixture, sterile deionized water for diluting 
and reconstituting reagents, and the like. 

The invention is further illustrated by the following non-limiting 
Examples. 

5 

EXAMPLE 1: Peptide synthesis and in vitro kinase assay 
Materials 

DIEA, piperidine (peptide synthesis grade), and TFA (HPLC grade) were 
obtained from Chem-Impex (Wood Dale, DL). DMF, ACN, MTBE, and MeOH 
10 were obtained from EM Science (Gibbstown, NJ). HOBT and HBTU (peptide 
synthesis grade) were obtained from AnaSpec (San Jose, CA). Fmoc-amino acid 
derivatives were obtained from AnaSpec (San Jose, CA) and Chem-Impex 
(Wood Dale, IL). Biotin was obtained from SynPep (Dublin, CA). 
Peptide Synthesis 

15 Peptides were synthesized as C-terminal amides on Mimotopes (Clayton, 

Australia) SynPhase Rink amide acrylic-grafted polypropylene solid support 
(loading 7.5 ^imole), arranged in a 12 x 8 format, in 96 well microtiter plates. 
Amino acid solution delivery was facilitated by a PinPal Amino Acid Indexer to 
indicate the appropriate amino acid to be delivered for each peptide in each 

20 coupling cycle. A solution containing a mixture of nineteen amino acids was 
delivered for specific peptides and coupling cycles to create degenerate peptides. 
Activation was preformed in situ with a solution of 0. 1 M HOBT/HBTU/DIEA 
in DMF. Each unique peptide sequence was synthesized with an N terminal 
Biotin-Lys-Gly spacer. A dansyl group was attached to the side chain of the 

25 spacer Lysine to serve as a chromophore (330 nm) to facilitate peptide 

quantification. Deprotection with 25% piperidine, DMF and methanol washes 
were preformed batch wise. After completion of the synthesis, the peptides were 
cleaved from the solid support and deprotected by acidolysis in the presence of 
scavengers using TFA/EDT/TA/anisole 90:4:3:3 (v/v/v/v). The crude peptides 

30 were precipitated and washed three times with cold MTBE, and lyophilized from 
water/ACN/HOAc 8:1:1 (v/v/v). 
Analysis 

The peptide products were validated and quantified via high throughput 
LC-MS. The system consisted of a Shimadzu (Columbia, MD) VP series HPLC 
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system and a PE Sciex (Foster City, CA) API 165 single quadrapole mass 
spectrometer. Reverse phase separations of 1 \iL injections were preformed 
using two Phenominex (Torrance, CA) 30 x 1 .0 mm Luna 3 \i C8 columns at 50° 
C with a flow rate of 350 jiL/min. The peptides were eluted by a linear gradient 
5 from 0% to 60% MeOH (0.1% HOAc) over five minutes and detected at 330 nm 
and 220 nm. For each LCMS injection, (M+H)/Z was extracted from MS data 
and compared to the expected mass for that sample, as calculated from its 
sequence. The UV absorbance trace was integrated to determine purity and 
yield. 

10 Degenerate Peptide Quantification 

Absorbance data for 10 fiL aliquots of degenerate peptide solution were 
acquired using a Labsystems (Beverly, MA) Multiskan Ascent plate reader 
equipped with a 340 nm filter. Yield was determined using a concentration 
factor calculated from absorbance data acquired on the same system from 

1 5 samples of known concentration that also contained a dansyl chromophore. 

Dried degenerate peptides were reconstituted in 90% water/10% ethanol. 
The concentration of peptide was determined by measurement of absorption at 
335 nm (maximal absorption wavelength for dansyl group), stock diluted to 
ImM and stored in sealed well at 4 °C. A replica plate was prepared with 

20 peptides at IOOjjM concentration in 90% water/10% ethanol and stored similarly. 
Kinase preparations 

Catalytically active preparations of the kinases of interest were eititter 
purchased or prepared. Purchased and tested active kinase preparations 
including the following: PKC-alpha, PKC-della, PKC-epsilon, PKC-zeta, PKC- 

25 mu, PKA, PKG from Calbiochem, ROK alpha/ROCK-H, active from Upstate 
Biotechnology, and AKT1 from Panvera. 

An example of the purification procedure used for production of active 
kinase is as follows. A preparation of PKC-theta was prepared using a Gateway 
expression construct containing PKC-theta that was expressed in baculovirus, 

30 which were used to infect Sf9 cells. The cell pellet from a liter of baculovirus- 
infected Sf9 cells was resuspended in 20 volumes (60 ml) of extraction buffer 
(20 mM Na phosphate buffer pH 7.5, 500 mM NaCl, 5 mM pyrophosphate, 10% 
glycerol, 10 mM imidazole, 1 mM PMSF), sonicated twice for one minute (1 cm 
tip at 60% power and 50% duty cycle) and cell disruption was verified 
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microscopically. The sample was adjusted to five mM MgCl 2 and treated with 
one unit benzonase/ml for an additional 20 minute on ice. The sample was 
clarified by centrifugation in a JA-20 rotor at 15K for 30 min at 4 °C, filtered 
through a 0.8 mm filter and applied at 0.5 ml/min to a one ml chelating 
5 sepharose column previously charged with nickel and equilibrated with 

extraction buffer. The column was washed with extraction buffer at one ml/min 
to baseline and eluted in a 20 ml gradient (20-500 mM imidazole in extraction 
buffer) into one ml fractions that were analyzed by SDS-PAGE. Fractions with 
the highest concentration of protein were pooled, were dialyzed twice against 

10 one liter of 20 mM Na P04 pH 7.5, 50 mM NaCl buffer. The kinase pool was 
dialyzed twice against 20 mM HEPES pH 7.4, 100 mM NaCl, 2 mM EDTA, 5 
mM DTT, 0.05% Triton-X-100. After dialysis, the sample was adjusted to 50% 
glycerol and quick-frozen in a dry ice/ethanol bath. 

More than 20 other preparations of PKC-theta have also been prepared 

1 5 and tested in the inventor's laboratory. The have been typically been transiently 
expressed in HEK293 cells, and purified by His-tag based isolation conceptually 
similar to that described above. Alternatively, they were immunoaffinity 
purified using anti-HA tag antibody to capture the protein when it has been fused 
to a HA epitope tag; such preps are released by incubation in an excess 

20 concentration of HA peptide. These include preparations derived from more than 
10 different variant constructs of PKC-theta. Point mutations have been 
produced using the QuikChange system from Stratagene, using the 
manufacturer's suggested procedures. 
Kinase assay 

25 The conditions of the kinase assay and the amount of active kinase used 

varied with the kinase and with the accuracy needed. For a typical experiment, 
5-20 ng of kinase was used per well and each peptide pool was assayed in 
duplicate wells. Note that the absolute amount of kinase used was not usually a 
critical parameter, because the desired information related to specificity of the 

30 kinase not its absolute activity, and robustness of the assay depends on 
comparisons of the same amount of kinase on different peptides. The 
combination of kinase concentration and assay duration was modified to assure 
that the stoichiometry of peptide phosphorylation never exceeded 5%. The 
choice of kinase buffer depended on the kinase being analyzed. For studies of 
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PKC, lOOmM HEPES, 0.05% Triton-XlOO, ImM CaC12, 20mM MgC12, 
0.2mg/ml phosphatidyl serine (Avanti Polar Lipids), PMA lOOng/ml was 
typically used. The lipid stock was prepared by transferring 3mg phosphatidyl 
serine into iced mixture of 450|ul water plus 50\A of 10% Triton-XlOO, 
5 sonicating 10 times on ice for 1 sec each. 

The kinase reaction mixture was assembled by sequential addition to a 
tube held on ice of: 5\d peptide (lOOpM for final concentration of lO^iM), 15jul 
of kinase (typically 5ng/well, in appropriate kinase buffer), 30nlofATP 
(luCi/well of 32 P-gamma ATP in a stock of 167 |jM cold ATP in the kinase 

10 buffer; for final concentration for 100|xM ATP). The mixture was rapidly 

wanned to desired reaction temperature (30°C for PKC) and incubated for the 
desired duration (usually 10 minutes). The kinase assay was terminated by 
transfer to 4°C water batch, and rapid addition of an equal volume (50pl) of stop 
solution [0. 1M ATP + 0. 1M EDTA in water, pH 8]. 

1 5 The peptides were then captured from the reaction mixture by transfer to 

a Reacti-Bind Streptavidin High Binding Capacity Coated Plates (HBC) (Pierce 
Biotechnology) as follows. The HBC plates were pre-rinsed three times with 
PBS/Tween PBS/Tween20 0.05% (PBS/Tween). Part of all of the reaction 
mixture was then transferred wells of a HBC plate pre-filled with 90^1 of 

20 phosphate-buffered saline (PBS); typically each aliquots of each phosphorylation 
reaction were transferred to duplicate HBC plates to assure accuracy by 
additional replication 

For kinase assays done at the standard peptide concentration of 10|xM, 
the peptide concentration in the reaction mixture becomes 5pM after addition of 

25 the stop solution; consequently 1 0\d of the reaction (50 pMoles of peptide) was 
transferred to the HBC plate. More generally, the amount of reaction mixture 
transferred was estimated to be about 50 pMoles of peptide. The inventor had 
validated that 50 pMoles of peptide was reliably and completely captured by the 
wells that had a nominal binding capacity of 125 pMoles. The HBC plates were 

30 incubated for 0.5 to 1 .5hr at room temperature for complete binding of 

biotinylated peptides to plate-bound streptavidin. The HBC plates were then 
washed extensively with PBS/Tween. Five washes were done routinely and 
additional wash steps were added if the wash solution removed from the plate 
had measurable radioactivity as detected using a Geiger counter. This step is 
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essential to obtaining a good the signal to noise ratio because the fraction of 
radioactivity incorporated in the peptides was a tiny fraction of the total in the 
reaction mixture. The wells were air-dried. A volume of 40 - 50 pJ of 
microScint-20 (Packard Instruments) was added to each well. The plates were 
5 covered with stick-on film sheet. Radioactive emissions were measured in a 
TopCount NXT Microplate Scintillation and Luminescence Counter (Packard 
Instruments). Typically samples were counted for 5 minutes (or more) to 
improve the signed to noise ratio when counts were low. 

10 EXAMPLE 2: Use of Reduced set of Query Residues 

The methods described herein provide for systematic variation of the 
query amino acid between peptides pools of a test set. In one embodiment, all 
naturally occurring residues will occupy the query amino acid position. In other 
embodiments, such as illustrated in FIG. 2 and FIG. 6, peptide pool variations at 

15 the query position were selected from a reduced set of amino acids. 

Because scoring of potential sites in proteins requires a PSSM that 
includes information on all naturally occurring residues, use of reduced sets 
requires extrapolation of information from tested residues to residues that have 
not been tested. The methods of the invention can readily be expanded to 

20 include additional residues that provide data to test whether the extrapolated 
results (e.g. those at the bottom of the chart in FIG. 5) are valid. 

For example, FIGs. 16 and 17 show scores for the P+l position of PKC 
theta using test set 1 (see also FIG. 2) and a test set 2 that is identical in sequence 
except that it includes 4 additional query residues and was synthesized several 

25 months after test set 1 . The two sets were tested in two different experiments 
that were performed several months apart. Nonetheless, the table and graph in 
FIGs. 16 and 17 show that the scores for the residues tested are in very good 
agreement. The results also showed generally adequate agreement between 
values extrapolated for untested residues and the values subsequently 

30 experimentally determined for those residues. For example, the Log Score for 
methionine at position P+l was extrapolated to be 0.7 and experimentally shown 
to be 0.8. However, the experimentally determined Log Score value for tyrosine 
(0.5) did differ somewhat from the extrapolated value (1.4). Because the 
differences in extrapolated and experimentally determined values for tyrosine 
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and phenylalanine were larger than optimal, in preferred embodiments test sets 
include both F and Y as query residues. 

EXAMPLE 3: Scoring phosphorylation sites Sequences from a PSSM and 
5 predicting best phosphorylation sites 

The prior art provides a scoring system by which kinase substrate 
preferences can be used to make predictions about phosphorylation by the kinase 
(Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC 200 L A motif- 
based profile scanning approach for genome-wide prediction of signaling 
10 pathways. Nat Biotechnol 19:348-353). This example illustrates how that 
scoring approach is done and validates the methods described herein when 
applied to a known PKC substrate. 
Methods Employed 

As shown in FIG. 1 8, a raw total score can readily be calculated for any 
1 5 peptide sequence using the data in a PSSM, for example, the PSSMs provided in 
FIG. 5, FIG. 7, and FIG. 16. The total score was determined by adding together 
the PSSM score for each of the residues of the peptide. This type of calculation 
is illustrated in FIG. 18 for a peptide corresponding to a known PKC 
phosphorylation site in the protein MARCKS having the sequence KKKKKRF- 
20 S-FKKSFK (SEQ ID NO:80). The score derived was for the sequence 

surrounding the Ser-159 of the intact MARCKS protein. For example, because 
the P-7 position of MARCKS was occupied by K, a score of 0.4 from column P- 
7 of FIG. 7 was used. The scores for the other thirteen residues were similarly 
derived from columns of FIG. 5, FIG. 7, and FIG. 17. The fourteen scores were 
25 combined for a total score of 7.4 for the KKKKKRF-S-FKKSFK (SEQ ID 
NO: 80) sequence in MARCKS. 

The raw total scores are informative in ranking individual peptides. 
However, it was even more useful to estimate the relative likelihood of 
phosphorylation of a peptide compared to many other peptides in the human 
30 proteome (i.e. proteins encoded by human genes). Such an estimate can be 
conveniently represented by a percentile score. To convert a raw score for a 
peptide to a percentile score, a relevant set of peptide scores must first be 
collected and sorted. Then, the relative position of the raw total score within that 
ordered set is determined. 
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Peptide sequences were examined that surrounded 1,071,932 Ser and Thr 
residues found in proteins encoded by 15651 human genes catalogued in the 
human reference sequence (RefSeq) collection maintained by the National 
Center for Biotechnology Information. The sequence of each protein was 
5 scanned to identity each residue that could be phosphorylated on Ser or Thr. . 
The sequence surrounding each of these sites was used to calculate a raw score 
for that site for each PSSM. The distribution of scores was determined, as 
illustrated, for example, in FIG. 19 for the PKOtheta PSSM. The median score 
for all these proteins was -0.9. 

10 From this distribution, a percentile score was determined for any given 

raw score. For example, a raw score of > 2.8 corresponds to the top 5 percentile 
and a raw score of >6.2 corresponds to the top 0.2 percentile of sites likely to be 
phosphorylated by a selected kinase. Using this distribution, each score can be 
assigned a percentile. For example, a raw score of 7.4 for the KKKKKRF-S- 

1 5 FKKSFK (SEQ ID NO:80) sequence in MARCKS corresponds to the 0.04 
percentile. Such a low percentile indicates that the KKKKKRF-S-FKKSFK 
(SEQ ID NO: 80) sequence in MARCKS is amongst the best candidate substrates 
for PKC. Therefore, this kind of finding indicates that using the PSSM provided 
by FIG. 5, FIG. 7, and FIG. 17, one of skill in the art can predict which sequence 

20 within which protein is particularly likely be phosphorylated by PKC-theta. 

In another embodiment, the invention provides methods for identifying 
which sites in a protein of interest are likely to be phosphorylated by a particular 
kinase, such as PKC-theta. FIG. 20 illustrates such an analysis for the thirty nine 
Ser and Thr residues in the protein MARCKS. The panel on the left shows the 

25 percentile score for each of the thirty nine residues. There is only one region of 
the MARCKS protein in which PKC phosphorylation sites are likely located. 
The panel on the right shows a portion of the analysis corresponding to this most 
likely region. Each row shows a candidate site, together with information on the 
position of the candidate site, and percentile predictions for phosphorylation at 

30 the candidate position by three kinases studied: PKC-theta, AKT1, and PKA. As 
shown in FIG. 20, two very strong candidate sites exist for PKC-theta at P0 
positions 159 and 163 (percentile< 0.2). The values for AKT1 and PKA suggest 
there are much less likely to be sites for phosphorylation by those kinases. 
These sites are precisely the two sites known to be physiologically relevant PKC 
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phosphorylation sites in MARCKS. This kind of validation has been reproduced 
in a number of other molecules with known PKC phosphorylation sites, such as 
alpha-, beta-, gamma-adducins, and GAP-43, 

' 5 EXAMPLE 4: Identification of in vitro phosphorylation sites for PKC 

Many peptides that are good substrates for PKC enzymes were identified 
using the methods of the invention. For example, Tables 4 and 5 provide a 
listing of peptides identified as potentially useful kinase substrates. The 
locuslink identifier (NCBI) for the gene, the gene symbol and the peptide 

1 0 sequence, together with results for results for phosphorylation by up to seven 
different kinases are provided Tables 4 and 5. Five PKC isoforms were tested 
using the methods described herein (see, e.g. Example 1): one classical PKC 
isoform (PKC-alpha), three "novel" PKC isoforms (PKC-epsilon, PKC-delta and 
PKC-theta) and one atypical PKC isoform (PKC-zeta). The data provided in 

1 5 Tables 4 and 5 show that novel and classical PKCs exhibit similar 

phosphorylation site preferences. In contrast to the general similarity of the 
substrates selected by the four classical PKC isoforms tested (PKC-alpha, PKC- 
epsilon, PKC-delta and PKC-theta), a more distant PKC isoform (PKC-zeta) and 
two other kinases in the same superfamily (AGC) show rather different patterns 

20 of phosphorylation. Note that Table 5 includes data for two different 

concentrations of substrate peptide during the assay (lOpM and lpM). Results 
are substantially similar at those two concentrations, indicating that these 
findings on specificity are of general relevance and pertain to phosphorylation 
over a broad range of substrate concentrations. 
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Quantitative analysis of correlations between phosphorylation of the 
same substrate by different kinases is shown in FIG. 21 . Such analysis confirms 
the conclusions that the novel and classical PKC isoforms are very similar in 
specificity, that there is greater divergence of the atypical PKC isoform PKC- 
5 zeta, and that the other kinases of the same superfamily (AGC) are even more 
divergent in specificity. 

Results in Table 2, Table 3, Table 4 and Table 5 demonstrate 
phosphorylation by PKC of many of the peptides. As validated herein, the 
methods of the invention predict that Ser and Thr residues within those peptides 
10 are the preferred sites of phosphorylation. Table 6 lists sequences of peptides in 
which pSer and pThr are present at positions corresponding to preferred PKC 
phosphorylation sites in peptides phosphorylated by PKC. Phosphopeptides 
included in Table 6 are only those corresponding to peptides whose efficiency of 
phosphorylation by PKC is greater than or equal to 10% of the best substrate. 
.15 Such a cutoff is relatively stringent. It is more rigorous than many previous 
methods in which the magnitude of phosphorylation is not compared with 
reference positives. 
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TABLE 6. Sequence of phosphopeptides 
corresponding to preferred sites of PKC phosphorylation 



SEQ ID Locus- 
NO Link ID 



301 

302 

303 

304 

305 

306 

307 

308 

309 

310 

311 

312 

313 

314 

315 

316 

317 

318 

319 

320 

321 

322 



202 
286 



absent in 
melanoma 1 
ankyrin R 



695 BTK 

1105 CHD1 

casein kinase I 

1455 gamma 2 

1612 DAP-kinase 1 

1612 DAP-kinase 1 

1794 DOCK2 

1901 S1P1 receptor 

2870 GRK6 

3985 LIMK-2 



4033 JAW1 
4296 MLK3 

4296 MLK3 



4542 myosin IF 

4820 NKTR 

5128 PCTK2 

5339 Plectin 

prostaglandin E 

5734 receptor 4 

5777 SHP-1 

5778 HePTP 
5778 HePTP 



Sequence indicating 
site of 
phosphorylation 
RSGRRRG-pS- 

QKSTDS 
AQIVKRA-pS- 

LKRGKQ 
FERGRRG-pS- 

KKGSID 
SEGRRSR-pS- 

RRYSGS 
FKRRKRK-pS- 

LQRHK- 
IKKRRTK-pS- 

SRRGVS 
KKRRTKS-pS- 

RRGVSR 
PEVKLRR-pS- 

KKRTKR 
YSLVRTR-pS- 

RRLTFR 
GGNRKGK-pS- 
KKWRQM 
— LRRR-pS- 

LRRSNS 
RFSRRSS-pS- 

WRILGS 
RRGTFKR-pS- 

KLRARD 
HVRRRRG-pT- 

FKRSKL 
KKERRRN-pS- 

INRNFV 
TSSYRSR-pS- 

YSRSRS 
KKFKRRL-pS- 

LTLRGS 
RKTSSKS-pS- 

VRKRR- 
SDFRRRR-pS- 

FRRIAG 
DKEKSKG-pS- 

LKRK-- 
RALSFRQ-pT- 

SWLS- 
EQQRRAL-pS- 
FRQTSW 



Percentile 
prediction for N or C- 
PKC-theta term 



0.0 
0.3 



0.2 

0.1 

0.1 

0.0 

0.2 

0.1 

0.1 

0.5 
0.0 



0.6 
0.8 

0.0 



0.0 

0.7 

0.1 
0.5 

0.0 
2.0 
2.0 
3.0 



N 



N 



N 



N 
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SEQID 
NO 

323 

324 

326 

327 

328 
329 

330 

331 

332 

333 

334 

335 

336 

337 

338 

339 

340 

341 

342 
343 

344 
345 
346 
347 



Locus- 
Link ID 

7074 

9221 
9360 

9595 
9595 
9595 
9595 
23031 
23031 
23031 
25836 
25836 
26191 
65125 
65125 
65125 

65125 
393 



409 
119 

202 
395 



TIAM1 
nucleolar 
phosphoproteln 
p130 

cyclbphllin G 

PSCDBP 

PSCDBP 

PSCDBP 
PSCDBP 



MAST3 

MAST3 

MAST3 

IDN3 

IDN3 

Lyp 

WNK1 

WNK1 

WNK1 

WNK1 

ARHGAP4 

beta-arrestln2 
adducin gamma 
absent in 
melanoma 1 
ARHGAP6 



672 BRCA1 
672 BRCA1 



Sequence indicating 
site of 
phosphorylation 
QAMSRSA-pS- 
KRRSRF 

KTKKKRG-pS- 

YRGGSI 
KKKHRKN-pS- 

RKHK- 
FGTLPRK-pS- 

RKGSVR 
PRKSRKG-pS- 
VRKQ- 
SSSRRNR-pS-ISN— 
DFLRRSS-pS- 

RRNRSI 
-RMARR-pS- 

KRSRRR 
ETQDRRK-pS- 

LFKKIS 
MARRSKR-pS- 
RRRETQ 
RRRSQRI-pS-QRIT- 

SGVRRRR-pS- 

QRISQR 
VILRPSK-pS- 

VKLRSP 
RRRRPTK-pS- 

KGSKSS 
SGRRRRP-pT- 

KSKGSK 
RKSVRSR-pS-RHE- 

TKRHYRK-pS- 

VRSRSR 
AGPLRKS-pS- 

LKKGGR 
EKSHKRN-pS- 

VRLVIR 
TPSFLKK-pS-RK — 

RR-pS-GRRRGS 

DGQKRKK-pS- 

LRKKLD 
NRLRRKS-pS- 

TREQHA 
-NRLRRK-pS- 
STRHLH 



Percentile 
prediction for N or C- 
PKC-theta term 

0.6 



0.5 
0.0 

0.2 
0.0 
0.3 
0.3 
0.2 
0.4 
0.2 
0.0 
0.0 
0.6 
1.0 
0.0 
0.6 

0.0 
0.3 



0.5 
2.0 

0.5 N 
0.1 



0.1 
0.1 
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SEQID 
NO 



Locus- 
Link ID 



Sequence indicating 
site of 
phosphorylation 
-GSEGRR-pS- 



349 


1105 


CHD1 


RSRRYS 
RRRRRSR-pT- 


0.7 


350 


1196 


CLK2 


FSRSSS 
RRRSRTF-pS-RSSS- 


0.0 


351 


1196 


CLK2 


- 

YRWKRRR-pS- 


1.0 


352 


1198 


CLK3 


YSREHE 
LRRSKKR-pT- 


0.1 


353 


1794 


DOCK2 


KRSS- 


0.9 




2081 


ERN1 


KLAVGRH-pS- 


1.0 


354 






FSRRSG 






2081 


ERN1 


AVGRHSF-pS-RR — 


4.0 


355 




forkhead 
(Drosophila)-like 


-RERRER-pS- 




356 


2305 


16 

forkhead 
(Drosophila)-like 


RSRRKQ 
ERRERSR-pS- 


0.0 


357 


2305 


16 


RRKQHL 


0.4 




3797 


kinesin 3C 


KRPRRKS-pS- 


0.0 


358 






RRKK-- 






3797 


kinesin 3C 


GKRPRRK-pS- 


0.0 


359 






SRRKK- 






3985 


LtMK-2 


KATTKKR-pT- 


1.0 


360 






LRKNDR 






3985 


LIMK-2 


RRRSLRR-pS- 


0.5 


361 






NSISKS 






3985. 


LIMK-2 


RSLRRSN-pS- 


0.1 


362 






TOT/ C% T» 

ISKSPG 
DRFSRRS-pS- 




OOO 


4UOO 


1 A 1AM 
JAW! 


o WKlLijr 
-JJJvr oJKJK.-po- 


3.0 


364 


4033 


JAW1 


SSWRIL 


3.0 




4171 


MCM2; 


--VQRHR-pS- 


0.0 


365 






MRKTFA 






4763 


NF1 


AGSFKRN-pS- 


0.5 


366 






IKKIV- 
SYRSRSY-pS- 




367 


4820 


NKTR 


RSRSRG 
RSRSYSR-pS- 


2.0 


368 


4820 


NKTR 


RSRG-- 
RASSRST-pT-KKR- 


1.0 


369 


4863 


NPAT 


FRASSRS-pT- 


1.0 


370 


4863 


NPAT 


TKKR- 
FKRRLSL-pT- 


1.0 


371 


5128 


PCTK2 


LRGSQT 


1.0 



Percentile 
prediction for N or C- 
PKC-theta term 



C 
C 
N 
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SEQID 
NO 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

384 

385 

386 

387 
388 
389 
390 
391 
392 
393 
394 



Locus- 
Link ID 

5339 Plectin 

5339 Plectin 

5587 PKD1 

5587 PKD1 

5590 pkc-zeta 

6840 Supervillin 

7074 TIAM1 
8436 serum deprivation 
response; 

8915 BCL10 
9020 NIK 

9101 ubiquitin specific 
protease 8 

9148 neurllzed-like 
9162 dag kinase iota 
PSCDBP 

9595 

9828 p164-RhoGEF 

10123 ADP-ribosylation 
factor-like 7 = 
ARL7 

10969 EBNA1BP2 

23227 WIAST4 

23227 MAST4 

25836 IDN3 
25865 PKD2 



26191 Lyp 
55357 PARIS1 



Sequence indicating 
site of 
phosphorylation 
--KRERK-pT- 

SSKSSV 
KKRERKT-pS- 

SKSSVR 
KHTKRKS-pS- 

TVMK-- 
VHYTSKD-pT- 

LRKRHY 
KSIYRRG-pS- 

RRWR-- 
NVMKRKF-pS- 

LRAAEF 
RSASKRR-pS- 

RFSS- 
-EKIKRS-pS- 

LKKVDS 
EISCRTS-pS- 

RKRAGK 
-WKGKRR-pS- 

KARKKR 
--RARRD-pS- 

LKKIEI 
--ALRRP-pS- 

LRREAD 
-NRKKKR-pT- 

SFKRKA 
DDFLRRS-pS- 

SRRNRS 
PRLIRRG-pS- 

KKRPAR 
MILKRRK-pS- 
LKQK- 

KRPGKKG-pS- 

NKRPGK 
MVRRSKK-pS- 

KKKESL 
-RMVRR-pS- 

KKSKKK 
EVSRPRK-pS- 

RKRVDS 
ARHGEK-pS- 

FRRSW 
-SVILRP-pS- 

KSVKLR 
EYLERRA-pS- 

RRRAV- 



Percentile 
prediction for NorC- 
PKC-theta term 
1.0 

1.0 

0.3 

3.0 



0.0 



0.5 

2.0 
3.0 



4.0 
0.8 

1.0 

0.5 
0.6 



1.0 
0.0 

0.0 
1.0 

0.5 

0.2 

0.4 
0.2 



0.9 
0.2 
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SEQID 
NO 

395 

396 

397 

398 

399 

400 

402 

403 
404 
405 

406 

407 

408 
409 

410 

412 

413 
414 

415 

416 

417 

418 
419 

420 

421 

422 



Locus- 
Link ID 

55672 FLJ20719 

55672 FLJ20719 

57082 AF15q14; 

57731 spectrin, beta, 

non-erythrocytic 4 

65125 WNK1 

672 BRCA1 

1196 CLK2 

1196 CLK2 

1196 CLK2 

1198 CLK3 

1198 CLK3 

1612 DAP-kinase1 

1612 DAP-kinase1 

1794 DOCK2 

2081 ERN1 

2837 Urotensin-2 
receptor 

2837 Urotensin-2 
receptor 

3985 LIMK-2 

3985 LIMK-2 

4171 MCM2; 

4171 MCWI2; 

4763 NF1 

4820 NKTR 

4863 NPAT 

4863 NPAT 

5587 PKD1 



Sequence indicating 
site of 
phosphorylation 
KKRRGRR-pS- 

TKKRRR 
KRRGRRS-pT- 

KKRRRR 
SKSQRRK-pS- 

LKLK-- 
EGGDRRA-pS- 

GRRK-- 
EYRRRRH-pT- 

MDKDSR 
RLRRKSS-pT- , 

RHIHAL 
-RRRRRR-pS- 

RTFSRS 
RSRTFSR-pS- 

SSMK-- 
RTFSRSS-pS-MK~~ 

-pS-YRWKRR 

WKRRRSY-pS- 

REHEGR 
-FIKKRR-pT- 

KSSRRG 
KSSRRGV-pS-RE~- 

SKKRTKR-pS-S 

RHSFSRR-pS-GV— 

YRRSQRA-pS- 

FKRA-- 
LARAYRR-pS- 

QRASFK 
— -KA-pT-TKKRTL 
— KAT-pT- 

KKRTLR 
RHRSMRK-pT- 

FARYLS 
KTFARYL-pS- 

FRRD- 
QKQRSAG-pS- 

FKRNSI 
RSYSRSR-pS-RG — 
NTQQFRA-pS- 

SRSTTK 
TQQFRAS-pS- 

RSTTKK 
VKHTKRK-pS- 

STVMK- 



Percentile 
prediction for 
PKC-theta 

0.0 

0.0 
0.0 

0.9 



0.4 

1.0 

0.0 

2.0 
2.0 
2.0 

2.0 

1.0 

1.0 

2.0 
4.0 

0.0 

0.1 

2.0 
4.0 

2.0 

2.0 

1.0 

5.0 

2.0 

3.0 
5.0 



Nor C- 
term 



N 
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Sequence indicating 


Percentile 


SEQID 


Locus- 




site of 


prediction for 


NO 


Link ID 




phosphorylation 


PKC-theta 




5587 


PKD1 


HTKRKSS-pT- 


4.0 


423 






VMK — 






5587 


PKD1 


— RWQ-pS- 


1.0 


424 






VKHTKR 






6429 


SFRS4 


KSKDKRK-pS- 


0.2 


425 






RKRS- 




426 


6429 


SFRS4 


KRKSRXR-pS 


0.6 




6429 


SFRS4 


RSRSRSK-pS- 


0.4 


427 






KDKRKS 






6429 


SFRS4 


RSRSRSR-pS- 


0.3 


428 






KSKDKR 




429 


6429 


SFRS4 


— RSR-pS-RSRSKS 


0.6 




6429 


SFRS4 


~RSRSR-pS- 


0.6 


430 






RSKSKD 




6594 




ARIQRRI-pS-DCKA- 


0.1 


431 




SNF2L 


- 










APLRRRE-pS- 




432 


6650 


SOLH 


MHVEQR 


0.0 




7273 


Titin 


DKKQIRS-pS- 


2.0 


433 






KKYR-- 






7273 


Tltln 


KDKRQLR-pS- 


0.7 


434 






SKKYK- 






8436 


serum deprivation 


SSLKKVD-pS-LKK- 


5.0 


435 




response; 


— 






8567 


MADD 


. SVRQRRM-pS- 


1.0 


436 






LRDD-- 










SRSRHRL-pS-RSR- 




437 


8621 


CDC2L5 


- 


0.1 








-SSRHSR-pS- 




438 


8621 


CDC2L5 


RSRHRL 


0.9 








YSRRRSP-pS- 




439 


8621 


CDC2L5 


YSRHSS 


0.3 








SRHSRSR-pS- 




440 


8621 


CDC2L5 


RHRLSR 


0.4 








-RDRGRR-pS- 




A A -i 

441 


8899 


PRP4 


T>CDT T> T> 


0.1 


442 


8899 


PRP4 


RSRLRRR-pS-RS — 


0.1 








RGGRRRR-pS- 




443 


8899 


PRP4 


RSKVKE 


0.0 








TTKKRSK-pS- 




444 


8899 


PRP4 


RSKERT 


0.4 








DRGRRSR-pS- 




445 


8899 


PRP4 


RLRRRS 


0.1 


446 


8899 


PRP4 


RLRRRSR-pS 


0.6 








GRRRRSR-pS- 




447 


8899 


PRP4 


KVKEDK 


0.0 


448 


9020 


NIK 


-KKRKKK-pS- 


2.0 



Nor C- 
term 
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SEQID 
NO 



449 
450 

451 

452 

453 

454 

455 

456 

457 

458 

459 

460 

461 

462 
463 

464 

465 

466 

467 
468 



470 
471 



Locus- 
Link ID 

9020 
9088 

9221 

9221 
9360 

9590 

i 

9590 

9595 

9934 

9934 

9934 

9934 

9934 

23031 
26191 
55357 

55762 

56000 

57468 

79142 
79142 
79877 
94121 



NIK 

Myt1 kinase 

nucleolar 

phosphoprotein 

p130 

nucleolar 

phosphoprotein 

p130 

cyclophilin G 
Gravin 
Gravin 
PSCDBP 

GPR105 

GPR105 

GPR105 

GPR105 

GPR105 

MAST3 
Lyp 

PARIS1 
FLJ10891: 



nuclear RNA 
export factor 3 
solute carrier 
family 12 member 
5 

MGC2941: 



Sequence indicating 
site of 
phosphorylation 

SKSLAH 
KKRKKKS~pS- 

KSLAHA 
QLQPRRV-pS- 
FRGE— 

EK-pT- 

KKKRGS 

RGSYRGG-pS-ISV- 

— TEEK-pS- 
KKRKKK 
AGWRKKT-pS- 

FRKP-- 
-AGWRKK-pT- 

SFRKPK 
-DDFLRR-pS- 
SSRRNR 
STSVKKK-pS-SRN- 

TSVKKKS-pS-RN— 

KSSRNST-pS- 

VKKKSS 
LKSSRNS-pT- 

SVKKKS 
-LKSSRN-pS- 

TSVKKK 
KRSRRRE-£T-QDR- 

VKLRSPK-pS- 



MGC2941 
FLJ22955 
slp4 



EYLERRA-pS- 

RRRAV- 
ARPKTRI-pS- 

NKYR-- 
SPYNRKG-pS- 
FRKQ- 
ITDESRG-pS-IRRK- 

PGDGEKR-pS- 

RIKKSK 
KRSRKK-pS- 

KKRK-- 
ARLMRRN-pS- 

LNRK-- 
-RQGKRK-pT- 



Percentile 
prediction for N or O 
PKC-theta term 

1.0 
1.0 



1.0 

0.6 
1.0 

0.4 

0.3 

3.0 

2.0 

2.0 

0.3 

2.0 

1.0 

0.1 
4.0 
0.2 

0.8 

0.1 

2.0 

2.0 
0.0 
0.0 
1.0 
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Sequence indicating 


Percentile 




SEQID 


Locus- 




site of 


prediction for 


NorC- 


NO 


Link ID 




phosphorylation 
SIKRDT 


PK(>theta 


term 




94121 


slp4 


RQGKRKT-pS- 


0.4 




472 






IKRDTV 








9162 


dag kinase iota 


NRKKKRT-pS- 


0.0 




473 






FKRKA- 
GTIRSKL-pS- 


0.1 




571 


547 


ATSV/KIF1A 


RRRSAQ 
SKLSRRR-pS- 


0.4 


c 


572 


547 


ATSV/KIF1A 


SAQMRV- 
PGRRRHR-pS- 


0.0 


c 


573 


10921 


RNPS1 


RSSSNS 
RRRHRSR-pS- 


0.2 


c 


574 


10921 


RNPS1 


SSSNSSR 
SGVRRRR-pS- 


0.0 


c 


575 


25836 


IDN3 


QRISQR 
RRRSQRI-pS- 


0.1 


c 


576 


25836 


IDN3 


SQRU- 
FFSLRRK-pS- 


0.3 


c 


577 


1608 


dag kinase gamma 


RSKD-- 
PQKSSFF-pS- 


2.0 


c 


578 


1608 


dag kinase gamma 


SLRRKSR 
SSLAQRR-pS- 


0.1 


c 


579 


27330 


P90-RSK6 


MKKRTS 
RSMKKRT-pS- 


i 

1.0 


c 


580 


27330 


P90-RSK6 


STGL— 
YSVKRKK-pS- 


0.0 


c 


581 


9014 


TAF1B 


RSKKVR 
VKRKKSR-pS- 


0.2 

i 


c 


5§2 


9014 


TAF1B 

spectrin, beta, non- 


SKKVRRH 
REREKRF-pS- 


0.2 


c 


583 


6712 


erythrocytic 2 


FFKKNK 
NERLRRE-pS- 


2.0 


c 


584 


941 


CD80 

casein kinase I 


VRPV- 
FKRRKRK-pS- 


0.0 


c 


585 


1455 


gamma 2 


LQRHK- 
RTRHARH-pT- 


0.1 


c 


586 


6621 


SNAPC4 


RKRRRL 
RRGGRRR-pS- 


0.3 


c 


587 


9939 


RBM8A 


RSPDRR 
GGRRRSR-pS- 


2.0 


c 


588 


9939 


RBM8A 


SPDRRRR 
KRKRTRP-pT-KSS- 


0.5 


c 


589 


6158 


RPL28 


KRTRPTK-pS-SS— 


2.0 


c 


590 


6158 


RPL28 


KRRLRTK-pT-AK- 


0.4 


c 


591 


9585 


MPP1 






c 



87 



WO 2005/028666 



PCT/US2004/029397 



SEQ ID Locus- 
NO Link ID 

592 9585 MPP1 

593 5336 PLCG2 

594 55762 FLJ10891 

595 2889 RAS-GRF2 

596 2889 RAS-GRF2 

597 117532 TMC2 

598 117532 TMC2 

599 11215 AKAP220 

600 22899 ARHGEF15 

601 10788 IQGAP2 

602 10788 IQGAP2 

603 1620 DBCCR1 

604 1620 DBCCR1 

605 9595 PSCDBP 

606 9595 PSCDBP 

607 9656 NFBD1 

608 9656 NFBD1 

609 785 CACNB4 

610 785 CACNB4 

KIAA0296 gene 

611 9726 product 

612 54221 SNTG2 

613 54221 SNTG2 

614 22947 DUX4 

615 22947 DUX4 



Sequence indicating 


Percentile 




site of 


prediction for 


NorO 


phosphorylation 


PKC-theta 


term 


HKRRLR-pT- 


0.5 




TKTAK— 




c 


REKRVSN-pS- 


2.0 




KFYS— 




c 


ERHHRLH-pT- 


3.0 




GKKS- 




c 


-KPRNI-pT- 


1.0 




RRKTDR 




c 


RNTTRRK-pT- 


1.0 




TDREEKT 




c 


DRLGRRS-pS- 


0.1 




SKRALK 




N 


RLGRRSS-pS- 


0.5 




SKRALKA 




N 


NHMKTKA-pS- 


0.1 




VRKSFS 




N 


IIRPRPPSR-pS- 


5.0 




RAAQ 




N 


KRKNTRR-pS- 


0.1 




DCLDG- 






DNLKRKN-pT- 


1.0 




TRRSIKL 






PRWRKRM-pS- 


0.1 




LTLKSN 






WRKRMSL-pT- 


3.0 




TLKSNKN 






SSSRRNR-pS-IS — 


0.4 




DFLRRSS-pS- 


0.5 




SRRNRSI 






— TSRA-pT- 


0.1 




RRKTNR 






SRATRRK-pT- 


0.1 




TNRSSVK 






HNERARK-pS- 


0.5 




RNRLSS 






RKSRNRL-pS- 


0.8 




SSSS— 






RAYRHRG-pS- 


0.0 




LVNHRH 






GRNRRTV-pT- 


0.1 




LRRQPV 






HQGRNRR-pT- 


0.2 




TVTLRRQ 






SRRPPRR-pS- 


0.1 




RSRRPG 






RPPRRSR-pS- 


0.1 




SRRPGLH 
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SEQ ID 
NO 

616 

617 

618 

619 
620 

621 

622 

623 

624 

625 

626 

627 

628 

629 

630 

631 

632 

633 

634 

635 

636 

637 

638 

639 



Locus- 
Link ID 

23524 

23524 

8471 

8471 
4926 

4926 

2318 

9656 

9656 

11214 

9744 

10129 

9656 

9656 

9656 

4690 

4690 

862 

862 

1793 

1793 

8826 

8826 

926 



SRm300 

SRm300 

IRS4 

IRS4 
NUMA 

NUMA 

gamma-filamin 
NFBD1 
NFBD1 
AKAP13 

centaurin beta 1 

hypothetical protein 
CG003 

NFBD1 

NFBD1 

NFBD1 

NCK1 

NCK1 

CBFA2T1 

CBFA2T1 

DOCK1 

DOCK1 

IQGAP1 

IQGAP1 

CD8beta 



Sequence indicating 
site of 
phosphorylation 
-RKARL-pS- 

RRSRSA 
KARLSRR-pS- 

SRSASSS 
HLPRGRR-pS- 

RRAVSV 
RRSRRAV-pS- 
SVPA— 
RSARRRT-pT-QI — 
TRSARRR-pT- 
TTQI— 
TRTFTRS-pS-HTY~ 

RGRKNRS-pS- 

VKTPET 
TRGRKNR-pS- 

SSVKTPE 
TKVSRTF-pS- 

YIKNKM 
SDRPRPG-pS- 

LRSKPE 
ERSRHQR-pS- 

FSVPKK 
PNRIPSR-pS- 

LRRTKL 
PSRSLRR-pT- 

TKLNQ- 
PKIRTRK-pS- 

SRMTPF 
KNSARKA-pS- 

IVKNLK 
— RKN-pS- 

SARKASI 
EKTRRSL-pT- 

VLRRAQ 
3VTVEKTRR-pS- 

SLTVLRR 
GYTLRKK-pS- 

KKG — 
EGWYRGY-pT- 

TLRKKSK 
REMKGKK-pS- 

KKISLK 
GKKSKKI-pS- 

SLKYT- 
AQPTKKS-pT- 

LKKRVA 



Percentile 
prediction for N or C- 
PKC-theta term 
0.2 

0.4 

0.4 

0.7 

0.2 
0.4 

2.0 

0.4 

0.7 

3.0 

1.0 

0.1 

0.5 

3.0 

0.0 

0.3 

1.0 

0.8 

1.0 

0.5 

1.0 

0.1 

0.5 

1.0 
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SEQ ID Locus- 
NO Link ID 

640 926 CD8beta 

641 10198 MPHOSPH9 

642 10198 MPHOSPH9 

643 8842 CD133 

644 8842 CD133 



Sequence indicating 
site of 
phosphorylation 
TAQPTKK-pS- 

STLKKRV 
MLSLRHN-pS- 
RIHVRP 
SRIHVRP-pS-SR- — 
VRTRIKR-pS- 

RKLADS 
— QVR-pT- 
TRDCRSR 



Percentile 
prediction for N or 
PKC-theta term 
2.0 

2.0 

2.0 
0.1 

3.0 



EXAMPLE 5: Analysis of different kinases using the same superset 

In many embodiments of the invention, the same superset of test peptides 
5 can be used to study the substrate specificity of a variety of different kinase 
enzymes. The anchor residue(s) and phosphorylatable residue in a test set (or 
superset, or collection) of peptides must be appropriate to the particular kinase 
whose specificity is being analyzed. However, a wide diversity of peptide 
sequences is available in the test sets, supersets, or collections of peptides 

10 provided by the invention. It is also fortunate that the results obtained to date 
indicate that there is sufficient similarity between the substrate specificities of 
different kinases that a single set (or superset, or collection) of peptide pools can 
be used to study the specificity of different kinases. Hence, for example, kinases 
of the protein kinase C family are sufficiently closely related that successful 

1 5 studies with other members of this family can be performed on the same or 
similar test sets of peptides. This was shown by studies that where one or both 
of the supersets of peptides designed for PKC were successfully used to analyze 
related kinases such as PKC-zeta, Protein Kinase A (PKA) and Protein Kinase G 
(PKG). See FIG. 22 and FIG. 25. 

20 FIG. 22 shows PSSM Logos for PKC-zeta and PKA derived by 

analyzing those kinases with the same peptide supersets used for analysis of 
PKC-theta. Because the sequence of PKC-zeta is similar to the PKC-theta 
sequence, PKC-zeta was expected to have fundamental similarities in substrate 
specificity. Those expectations were confirmed by the PSSM Logo 

25 representation of the data. One of the most prominent differences between PKC- 
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theta and PKC-zeta was the preference for a hydrophobic amino acid (e.g., 
phenylalanine, F) at P-5. This characteristic preference of PKC-zeta was 
confirmed using the methods of the invention and was further validated by 
previous tests (Nishikawa K, Toker A, Johannes FJ, Songyang Z, Cantley LC. 
5 1997. J Biol Chem 272:952-960). Similarly, PKA has a strong preference for 
positively charged residues in positions P-2 and P-3 (FIG. 22), as previously 
shown by Kreegipuu A, Blom N, Brunak S, Jarv J. 1998. Statistical analysis of 
protein kinase specificity determinants. FEBS Lett 430:45-50.) 

Predictions were made as to which amino acids would occupy what 

10 positions in the phosphorylation substrate recognized by PKC-zeta. These 

predictions were then tested by measuring PKC-zeta mediated phosphorylation 
of the same set of proteomic peptides that were tested for PKC-theta. The results 
for this testing are shown in FIG. 23 (panel a) and demonstrate that the PKC-zeta 
prediction was excellent The quality of the prediction was affirmed by the 

1 5 comparison with the results of predictions by the Scansite for PKC-zeta (FIG. 
23, panel b). Problems with the Scansite prediction were evident from the 
finding that the best peptide has a score of >4th percentile and several other of 
the better substrates also have scores >4 jh percentile. 

Given the similarity between the PSSM Logo for PKC-zeta and PKC- 

20 theta, it was possible that the good results for PKC-zeta and PKC-theta are 
redundant, and that nothing new has been learned from PKC-zeta. That 
possibility was addressed in two ways. First, the data were checked to ascertain 
whether PKC-delta/theta and PKC-zeta were equivalent in their phosphorylation 
of the set of proteomic peptides. Results in FIG. 23 (panel c) show that although 

25 there was a general correlation between the phosphorylation patterns of those 
different kinases, there were also substantial differences. Therefore, an analysis 
was performed on whether the PKC-zeta prediction would satisfactorily predict 
phosphorylation by PKC-delta. The results in FIG. 23 (panel d) demonstrate that 
PKC-zeta predictions would not. Thus predictions from the PKC-zeta PSSM 

30 predict well phosphorylation by PKC-zeta but not PKC-theta while predictions 
from the PKC-theta PSSM predict well phosphorylation by PKC-theta (and 
PKC-delta). These findings strongly validate the high degree of specificity 
provided by the methods of the invention. 
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Further investigations were performed to ascertain what residues may 
account for differences between substrates in the predicted phosphorylation by 
PKC-theta and PKC-zeta. FIG. 24 provides a detailed analysis of the scoring for 
the six substrates whose behavior contributed most to the mismatch in FIG. 23, 
panel d (and corresponding match in FIG. 23, panel a). Scoring for those 
peptides with the PKC-theta and PKC-zeta predictions were tabulated. Residues 
that showed the biggest improvement in score with PKC-zeta relative to PKC- 
theta were identified (difference >0.5) and are underlined. Better recognition by 
PKC-delta could be due to a favorable residue for PKC-delta recognition that is 
less favorable for PKC-zeta recognition (referred to herein as "control by 
favorable residue"), or to neutral residue for PKC-delta recognition being 
unfavorable for PKC-zeta recognition ("control by unfavorable residue"). The 
results indicate that much of the poorer recognition by PKC-zeta was due to at 
least one unfavorable residue. For example, the six biggest changes in score for 
each peptide have been boxed in black in FIG. 24. Five of those six changes are 
from a residue slightly unfavorable for PKC-theta to a residue very unfavorable 
for PKC-zeta. This is best illustrated by peptides 2 and 3, which have a proline 
at -5 that was slightly unfavorable for PKC-theta and very unfavorable for PKC- 
zeta. The strongly disfavored proline at -5 for PKC-zeta (but not for PKC-theta) 
can be seen in FIG. 22. This principle is similarly illustrated by the peptide 1, 
which has an isoleucine at P-l (predicted as being disfavored based on the 
results for leucine with PKC-zeta, FIG. 22) and peptide 5, which has an W at P-5 
(strongly disfavored by PKC-zeta, FIG. 22). 

Control of kinase specificity by unfavorable residue(s) was also strongly 
suggested by the findings that PKA, PKC-theta and PKC-zeta all strongly 
disfavor proline at P+l (FIG. 22). This contrasts sharply with the preferences of 
another major class of kinase, the proline-directed kinase, for which a Proline at 
P+l is a critical residue. Thus, an important part of the reciprocal specificity 
between the basophilic kinases and the proline-directed kinases (such as CDK1) 
is that proline at P+l was disfavored by the former and favored by the latter. 
Thus, "control by unfavorable residue" appears to be a major element in kinase 
specificity. This is important, because the methods of the invention can be very 
accurate at quantifying unfavorable recognition. Many of the prior art 
techniques may not be ideal for determining strength of unfavorable recognition; 
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for example, the methods disclosed in U.S. Patent 6,004,757 may be limited in 
doing so by reason of limitations in amino-acid sequencing. 

EXAMPLE 6: Analysis of mutant kinases 
5 In another embodiment, the methods of the invention can be used to 

analyze the substrate specificity of mutant kinases. A major strategy for 
analyzing protein structure and function involves deriving mutant constructs, 
expressing them, and determining how the mutation influences the function 
and/or specificity of the resulting mutant protein. Given the previous difficulty 

10 in assessing kinase specificity, there have been no prior studies that 

systematically analyze the specificities of mutant kinases. However, the 
methods of the invention can be used for this purpose. 

For example, more than ten mutant constructs of PKC-theta have been 
made and analyzed by the inventor using the present methods to ascertain what 

1 5 types of specificity changes occur. Results of some of the more informative 
constructs are shown as PSSM logos in FIG. 26. Because only changes in 
substrate specificity were assessed and not changes in auto-inhibition resulting 
from altered binding of pseudo-substrate, the parental construct PKC-theta was 
used that had been previously mutated to a constitutively active form by 

20 mutating the pseudo-substrate (A148E), shown in FIG. 26. Results are shown 
for four constructs in which acidic residue in the catalytic cleft has been mutated 
(FIG. 26). 

The most striking finding amongst the constructs studied was deviation 
of construct D465A from the overall pattern of substrate specificities shared by 

25 wild type PKC-theta (FIG. A), constitutive active A148E (FIG. 26) and the three 
other mutant constructs derived from constitutive active A148E (D544A, 
D508A, E571I, FIG. 26). The differences observed in D465A specificity 
compared to other PKC-theta enzymes are: 1) the shapes of the PSSM Logo (i.e. 
relative height of individual columns) and 2) the general position of individual 

30 residues in particular columns. 

Regarding the shape of the PSSM Logo, a feature absolutely conserved 
amongst constructs other than D465 A was that the P+2 position was always the 
tallest. Usually the P+l position was the second tallest and there was wobble as 
to which of the other positions was third tallest. However, mutant D465 A was 
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strikingly different. Position P+2 of the preferred substrate for the D465A 
mutant has dropped from the most prominent to one of the three least prominent 
and the P+l position has likewise dropped in prominence. Taken together these 
data indicate that the D465A mutant has a marked reduction in reliance on the 
5 usual C-terminal residues that typically guide substrate specificity in all other 
kinase constructs. 

A detailed understanding of kinase specificity requires understanding of 
the residues favored at each position. PSSM Logos (FIG. 26) also reveal that the 
strong preferences and lack of preferences of the wild type construct for residues 

1 0 at particular positions was typically conserved amongst most mutant kinase 
constructs. These generally include: 1) a preference for basic residues at each 
position; 2) an absolute preference for a hydrophobic residue that exceeds the 
preference for basic residues at the P+l position (and occasionally P+3); 3) a 
strong disfavor for aspartic acid ('D') at most positions; 4) a strong dislike for 

15 hydrophobic residues at P-2; and 4) a strong disfavor for proline ( e P') in a C- 
terminal position. As with the overall shape, D465 A was also an outlier with 
regard to these preferences and disfavors. Note particularly the moderation, or 
reversal in preference for the typically disfavored *P' and 'D' residues in the C- 
terminal positions of the substrate. 

20 The marked changes in preference of the D465A mutant toward the C- 

terminal residues were not anticipated. However, it is known that the side chain 
of D465 coordinates with ATP. Consequently truncating the side chain of D465 
would be expected to perturb some aspect of ATP binding or function. No major 
change in the Km for ATP, however, was revealed by analysis of the kinetic 

25 parameters for D465 A. Therefore, ATP contact with the remainder of the ATP 
pocket within the enzyme may be sufficient for good binding in D465A. 
However, the conformation of the enzyme's N-lobe may be abnormal due to a 
lack of favorable interaction between the D465 side chain and other elements in 
the N-lobe. This incomplete closure would be expected to alter the "closed 

30 conformation" that the enzyme usually adopts during catalysis, and alter 
movement of alphaC towards the activation loop. 
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EXAMPLE 7: Analysis of different assay conditions with methods of the 

invention 

Tests were performed on a wild type kinase to examine whether low ATP 
concentrations would favor an ordered reaction in which a peptide binds first in 
5 the absence of ATP, and subsequent loading of ATP rapidly proceeds to 
catalysis. The PSSMLogo for such as assay is shown in FIG. 26. This 
PSSMLogo for low ATP reveals a distortion of shape that bears substantial 
resemblance to the D465A PSSMLogo. Specifically, there were decreases in 
height of the P+2 and P+3 columns that are even more marked that those 
1 0 observed with D465A. Moreover, like D465A, the low ATP profile has lost 
many of the characteristic preferences of the other constructs at these positions 
(see below). 

Visualization of D465A preferences at individual positions was 
facilitated by the graphical analysis shown in FIG. 27, which shows data for the 

1 5 eight most informative residues at four particularly informative positions. 

Positions P-2 and P-3 are shown in part because those are the peptide positions 
at which the greatest changes resulting from point mutations of acidic residues 
were anticipated. Positions P+2 and P+3 are shown because they are the location 
of many of the biggest changes in D465A and low ATP conditions. The most 

20 striking finding was the similarity in residue preference that occurs with D465A 
and low ATP, but not for other mutants. There were fifteen such changes, 
denoted with solid arrows below the x-axes in FIG. 27. Amongst these changes, 
five occur in the N-terminal P-2 and P-3 positions. Two of these N-terminal 
changes were ones that had been predicted, namely decreased preference for H at 

25 P-3 and decreased disfavor for D at P-3. The failure to see decreased preference 
for R or K at P-3 suggests that conformational flexibility allows binding of the 
P-3 substrate residue to residues other than D465 in the cleft (most likely D544 
orD508). 

The correlation between the D465 A and low ATP changes in the C- 
30 terminal region of the substrate was striking. In almost all cases the changes in 
substrate preference observed for D465 A involve neutralization of the strong 
preferences (either negative or positive) observed for related kinases. In 
contrast to D465A, changes in substrate preference for the other three point 
mutants are quite modest both in number and magnitude of change. However, 
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some changes in substrate preference for the D508A mutant bear similarity to 
those found in D544A (denoted with dashed arrows above the axes in FIG. 27). 
Both have lost their disfavor for D at the P-2 position (consistent with repulsion 
by nearby residues). Both also show a modest decrease in preference for R, not 
only at P-2 but also at P-3. 

The methods of the invention are therefore informative not only for 
studying the specificities of mutant kinase constructs, but also for analyzing 
changes in kinase specificity resulting from different assay conditions. It can be 
easily appreciated by one of skill in the art that the present methods would be 
useful in analyzing importance of other assay conditions, such as ion 
concentration (Ca++, Mg++, H+), and temperature. The present methods would 
also be useful in determining whether addition of other molecules to the assay 
influenced peptide specificity, for example by allosteric effects. 

EXAMPLE 8: Further understanding of anchor residues and their 
variations in test sets 
Understanding of substrate specificity usually requires understanding the 
residue preferences at every position close to the phosphorylation position. The 
problem related to establishing anchor positions is that positions that are chosen 
as anchor residues in a set cannot, by definition, also be query or variable 
positions in that set. For example, the peptide test set Rxx-S-F uses anchor 
residues at positions P-2 and P+l . Therefore, information on the P-2, PO and 
P+l positions cannot be obtained from the Rxx-S-F test set. In the embodiment 
shown in FIG. 2, the P-3, PO, and P+l positions were analyzed by using 
diminished numbers of anchor residues. For example, for the P+l test set, the 
anchor at P-3 was retained, but the P+l position was used as the query position 
(variable residue). Note that the methods of the invention provide strategies for 
designing and using a variety of test sets that could determine information about 
the residue preference for PKC-theta at the P+l position. FIG. 28 illustrates 
results with such varied test sets used for analysis of specificity of PKC-theta; 
each column of the PSSM logo represents results with a single test set and the 
symbolic representation of that set is shown below the column. Consider for 
example residue preference at the P+l position, which our experience with the 
methods of the invention indicates is particularly important. Residue scores 
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determined for that position vary depending on the number (and position) of the 
anchor residues used in the test set. Also note that the results differ significantly 
for test sets in which the phosphorylatable residue is T rather than S. For one 
skilled in the art, the methods of the invention provide many strategies to refine 
5 the definition of specificity for a kinase. For example, because the P+l 
preferences for threonine phosphorylation differ from those for serine 
phosphorylation, one can create test sets analogous to those shown in FIG. 2, but 
using T as the phosphorylatable residue. Results with those peptides would 
allow more precise predictions, because they would be tailored specifically to 

10 relevant subsets of peptide substrates. 

FIG. 29 illustrates results with another superset of test sets of peptide 
pools based on a single anchor residue of R at P-3 and threonine as the 
phosphorylatable residue. Results shown are for the kinase ROK-alpha, about 
which there is little general understanding of specificity in the literature. This 

15 superset is designed as a screening set to ascertain gross preferences from which 
to choose an additional anchor position. For that reason, it was most economical 
to only include 4 query residues: R, E, L and F, which our experience indicates 
are particularly important anchor residues . Even this limit analysis shows a 
strong overall preference for R, indicating ROK is clearly a "basophilic kinase". 

20 The only position tested which has a dominant hydrophobic preference is P+3. 
One practiced in the art of this invention can appreciate that the third anchor 
position for a full test set of peptides should most likely be an 'R' at the P-4 or 
P-5 positions, where it has the strongest preference and where there are no other 
favorable residues. 

25 

EXAMPLE 9: Querying by Fixed Residue at Varied Positions rather than 
by Varied Residue at Fixed Position 

The large family of basophilic kinases has a preference for arginine (R) 
at many positions in the substrate (see for example, FIG. 8, FIG. 13, FIG. 22, 
30 and FIG. 29). Accordingly, arginine is a good candidate for an anchor residue at 
the high-scoring position(s). With this in mind, over-representation of arginine 
in anchor optimization sets used to assign anchor positions is a good first 
approach for an assay designed to assign anchor positions because the data 
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indicate that arginine can markedly enhance the efficiency of phosphorylation 
when it is present in a peptide substrate for such kinases. 

In this Example, an anchor optimization set referred to as an "R-pair set 5 ' 
was created to systematically evaluate the use of arginine in each position 
5 around P0 (in this set occupied by serine) from position P-7 to P+3 . FIG. 30 
shows the forty-five peptide sequences of this R-pair set Results for the R-pair 
set using protein kinase A (PKA) are shown in FIG. 3 1 . The results were 
calculated in a fashion similar to the sets described previously. Residue 
preference was calculated as follows: 
10 Fcpm for a peptide, calculated as the geometric mean for rep licate 

values]/ 

[geometric mean cpm for all peptides in the set]. 
The position specific residue score was determined by calculating log2 of the 
residue preference. An average score for arginine at each position was also 
1 5 calculated as the arithmetic average of the scores for all nine peptides that have a 
fixed arginine at the position. Inspection of the average score reveals that there 
PKA shows a strong overall preference for arginine at positions P-3 and P-2. 
Inspection of the results for individual peptides confirms that PKA most 
efficiently phosphorylates the individual degenerate peptide that has arginine 

20 fixed at both P-3 and P-2. These results for PKA are in agreement with a 
summary of the literature, for example with results obtained by the Tegge 
approach to determining optimal kinase substrates (Tegge W et al. 1995. 
Biochemistry 34:10569-10577). 

One simple way to summarize the results of studies with the R-pair set is 

25 to determine the geometric average preference for all peptide pools that have R 
at a given position. For example, in this embodiment, there are 9 peptide pools 
that have R at P-3 (see FIG. 30 and FIG. 3 1). The geometric average preference 
for R in those 9 pools is 1.5 (FIG. 32). Similar calculations for the other 
positions, results in the graph shown for PKA in FIG. 32 which likewise 

30 illustrates that PKA prefers R at P-2 and P-3. 

Use of the R-pair set for anchor optimization with other kinases is 
likewise highly informative. For example, a comparison of the average position- 
specific scores for PKC-alpha and AKT1 with those described above for PKA is 
shown in FIG. 32. As shown in FIG. 32, PKC-alpha prefers arginine at P-3, P-2 
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and P+2. This is precisely the dominant positions at which the strongest 
preference for basic residues have been found in a summary of literature results 
for PKC (Kreegipuu A et aL 1998. FEBS Lett 430:45-50). Results from an Im- 
pair analysis with AKT1 show that arginine is preferably placed at positions P-3 
5 and P-5 (FIG. 32); these results are in agreement with findings from the 
literature (ObataTetal. 2000. J Biol Chem 275:36108-36115). Thus, the 
strategy provided herein for efficiently scanning for critical residues provides 
highly informative results. These residues are candidates for anchor residues for 
more complete degenerate residue sets. One key advantage of this particular set 

1 0 (and the approach of position scanning) is that it provides an impartial way to 
assess the most important position for R without introducing biases from other 
anchor residues. This general strategy of scanning for the optimal position of a 
defined amino acid is referred to herein as "Optimal Residue Position Scanning" 
(ORPS). The ORPS approach is further illustrated in Example 12 using arginine 

1 5 and phenylalanine as the defined amino acids. 

\ 

EXAMPLE 10: Detection of SHP-1 phosphorylation in whole cells 

Prediction of phosphorylation sites is ultimately most useful to 
understanding cellular physiology when it can be applied to facilitate 

20 identification of sites that are relevant in intact cells. This Example illustrates 
strategies for analyzing phosphorylation of the SHP-1 protein that extend the 
information provided from the previously illustrated in vitro studies. 

SHP-1 (also referred to as PTPlc, PTPN6 and SHPTP-1) is a tyrosine 
phosphatase that critically regulates many signaling responses, including 

25 activation of T-lymphocytes by the T-cell receptor (Okumura M et al. 1995. 
Curr Opin Immunol 7:312-319; Kosugi A et al. 2001. Immunity 14:669-680). 
The functioning of SHP-1, and especially its phosphatase activity, is modified by 
phosphorylation. Sites thought to be phosphorylated include Y536 and Y564, 
both of which are close to the C-terminus of the molecule (Zhang Z et al. 2003. 

30 J Biol Chem 278:4668-4674). 

SHP-1 has been shown to be a substrate for serine phosphorylation by 
PKC (Zhao Z et al. 1994. Proc Nad Acad Sci U.S.A. 91:5007-501 1). 
Phosphorylation of SHP-1 by PKC results in decreased catalytic activity of SHP- 
1 (Brumell JH et al. 1997. J Biol Chem 272:875-882). Other investigators have 
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shown that a closely related phosphatase, SHP-2, is phosphorylated on serine 
residues close to its C-terminus (Strack V. et al. 2002. Biochemistry 41:603- 
608). However, Strack et al. (id.) incorrectly inferred that SHP-1 is not 
phosphorylated by PKC and previous studies have not identified the critical site 
5 of phosphorylation by PKC. 

Phosphorylation of SHP-1 was analyzed using the methods provided 
herein, including the predictive algorithm for PKC-theta. Because 
phosphorylation by PKC-theta correlates highly with that for PKC-alpha and 
PKC-delta, these predictions have relevance at least for PKC-alpha and PKC- 

10 delta, and likely provide a generalized prediction for novel and classical PKCs. 

Table 7 provides the predictions made by the methods of the invention 
for SHP-1 phosphorylation. For PKC phosphorylation using the fifth percentile 
as a conservative cutoff that will include all plausible candidate sites for PKC 
(see FIG. 9 and FIG. 1 1), only three sites in SHP-1 are predicted to be 

15 phosphorylated (sites Ser-591 SEQ ID NO 298, Ser-26 SEQ ID NO 299 and Ser- 
32, SEQ ID NO 300). 

TABLE 7. Three Predicted PKC Phosphorylation sites in SHP-1 



whose corresponding phosphopeptides bind best to pPKC antibody 



Site 




pPKC 
antibody 
Score 


Gene 
and 

Protein SEQ Phospho peptide 
Name ID NO Sequence po 


PKC-Theta 
PKC-Zeta 
PKA 


SHP-1 


ADKEKSKG-pS- 

298 LKRK — 591 
LKGRGVHG-pS- 

299 FLARPSRK 26 
HGSFLARP-pS- 

300 RKNQGDFS 32 


_1 _8_ 10 
0^3 pM] 10 
2 2 20 


4 
2 
3 




MKNAHAKA-pS- 

289 RTSSKHKE 553 
RVELQGRD-pS- 

290 NIPGSDYI 294 
AHAKASRT-pS- 

291 SKHKEDVY 556 
KKKLEVLQ-pS- 

292 QKGQESEY 528 
PSEPGGVL-pS- 

293 FLDQINQR 431 


8 8 10 
60 60 10 
10 20 30 
30 30 90 
50 50 30 


2 
2 
3 
2 
2 
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HAKASRTS-pS- 












294 


KHKEDVYE 
PWTFLVRE-pS- 


557 


8 


7 


2 


1 


295 


LSQPGDFV 
KNQGDFSL-pS- 


138 


40 


20 


7 


3 


296 


VRVGDQVT 
PLNCSDPT-pS- 


42 


10 


20 


50 


3 


297 


ERWYHGHM 


107 


60 


60 


90 


2 



As shown in Table 3, a peptide that includes Ser-591 is phosphorylated 
by PKC (see SEQ ID NO:209, in Table 3). In particular the in vitro 
phosphorylation by PKC-theta was measured for the DKEKSKGSLKRK — ( 
5 SEQ ID NO:209) peptide and shown to be 17. A commercially available 

antibody from Cell Signaling Technology, referred to as a phospho-PKC motif 
antibody (designated herein as pPKC Ab), was used to generate the antibody 
binding data illustrated in Table 3. (See U.S. Patent 6,441,140 and Cell 
Signaling Technology Datasheet for 'Phospho-(Ser) PKC Substrate Antibody'). 

10 Information from Cell Signaling Technology indicates that this antibody 

preparation may recognize a motif consisting of positively charged residue at P- 
2, a serine at P0, a hydrophobic residue at P+l and a positively charged residue 
at P+2. Such antibodies can be used for detection of unknown proteins that 
contain phosphorylation sites conforming to the motif to which they bind. For 

1 5 example, phosphorylated proteins can be detected on two-dimensional gels with 
the pPKC Ab and the identity of these phosphorylated proteins can be confirmed 
by the observed molecular weight, isoelectric point and other information such 
as the predictive algorithms provided herein. Similarly, such detected proteins 
can be enriched by classical biochemical separations, and when sufficiently 

20 enriched, can be identified by mass spectrometry (AstoulEetal. 2003. J Biol 
Chem 278:9267-9275). 

One basis for predicting whether the pPKC antibody can bind to a 
particular phosphorylation site is the extent of its conformity with the motif 
described for the antibody: [RK]x-pS-[I^[TLMV][RK]. Therefore for each 

25 candidate site in SHP-1, a score from 0 to 4 was calculated based on the number 
of matches of the sequence to that pattern. That "pPKC antibody score" is 
tabulated for pertinent SHP-1 sites in Table 7. Ser-591 is the only site in SHP-1 
that has a perfect score of 4. 
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To test whether phosphorylation actually occurs at these sites in vivo, an 
antibody specific for the corresponding phosphorylated peptide can be used 
However, because the identity of the relevant sites was previously unknown, no 
such specific antibodies were available. The inventor therefore devised an 
5 alternative approach using the pPKC Ab. Although antibodies such as the pPKC 
Ab are poly-specific, they can be constrained to provide information on the 
phosphorylation state of a particular molecule such as SHP-1 by isolating the 
molecule of interest and then testing the antibody for reactivity with that isolated 
molecule. That strategy was implemented for SHP-1 . 
10 In particular, SHP-1 was immunoprecipitated from the cell lysate of the 

cell line JURKAT with an anti-SHP-1 antibody (C-19; from Santa Cruz 
Biotechnologies) and protein G beads. The purified SHP-1 was separated by 
standard polyaciylamide gel electrophoresis, transferred onto a membrane, and 
blotted with 2 different antibodies as shown in FIG. 15. Results from Western 
15 blotting with the anti-SHP-1 antibody (C-19 from Santa Cruz Biotechnologies) 
demonstrate that SHP-1 was successfully isolated and that it had a molecular 
weight of 64kd, characteristic of SHP-1. That SHP-1 immunoprecipitate also 
reacted with the pPKC motif Ab, indicating that a phosphorylated site(s) exists 
on SHP-1 that conforms to the motif recognized by the pPKC antibody. 
20 FIG. 1 5 also provides information on JURKAT cells stimulated to 

activate SHP-1 via a T-cell receptor. Specifically, Jurkat T Ag cells were 
stimulated with CD3 antibody (clone 38.1, IgM ascites, 1:1000 Final) plus CD28 
antibody (clone 9.3, sup, 1:1000 final) for different times, as indicated in FIG. 
15. The amount of phosphorylated SHP-1, detected by intensity of the band on 
25 the pPKC antibody Western blot, increased markedly within the first minute 

following stimulation. These data demonstrate that the phosphorylation of SHP- 
1 at the sites recognized by the antibody is increased following T-cell receptor 
stimulation. Thus, the site(s) on SHP-1 detected by the pPKC antibody (Table 
7) are biologically relevant for immune cell responses (FIG. 15). 
30 Two lines of evidence strongly suggested that S591 was a functionally 

significant phosphorylation site on SHP-1: S591 was uniquely strong predicted 
to be phosphorylated by PKC, and S591 had a uniquely good fit to the pattern 
detected by the pPKC antibody. 
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To directly test the functional significance of S591, a SHP-1 construct 
was generated in which S591 was mutated to alanine (i.e. S591A mutation) to 
test whether SHP-1 was still phosphorylated in the absence of the S591 residue. 
The mutation was created using the Quikchange methodology from Stratagene. 
5 Using similar methods, an A148E mutation was also made in PKC-theta to 

generate a construct encoding constitutively active PKC-theta. Wild type SHP-1 
and S591A mutant SHP-1 were transfected into 293T cells using calcium 
phosphate transfection in the presence or absence of the constitutively active 
PKC-theta construct. The transfected cells were cultured for 24hr, lysed, and 

10 analyzed by Western blot in a manner generally similar to FIG. 15. Two 
important results came from the analysis (FIG. 42). First, co-transfection of 
PKC-theta with wild type SHP-1 resulted in phosphorylation of SHP-1 as 
detected by the pPKC antibody. Second, such phosphorylation was absence in 
the S591A construct, indicating that S591 is a major, if not the major, site of 

15 SHP-1 phosphorylation. These results therefore established that SHP-1 S591 
can be phosphorylated by PKC-theta. 

Although the pPKC antibody can identify important phosphorylation 
sites, the pPKC antibody is designed to recognize many different 
phosphorylation sites that have basic residues at P-2 and P+2. For example, as 

20 described by its manufacturer, Cell Signaling Technology, the pPKC antibody 
binds to SEQ ID NO:229 (WKN-pS-IRH). Hence, the pPKC antibody is not 
particularly site-specific. 

Therefore a site-specific phospho-antibody was generated. A phospho- 
peptide having sequence CDKEKSKG-(pS)-LKRK-OH (SEQ ID NO:570) was 

25 made. This phospho-peptide includes a sequence that corresponds to the C- 
terminus of SHP-1 but, in addition, it has an N-terminal cysteine useful for 
coupling to a carrier. The corresponding non-phosphorylated peptide was also 
synthesized for use as a control. The phospho-peptide (SEQ ID NO:570) was 
coupled onto the carrier KLH, rabbits were immunized, and anti-sera samples 

30 were screened for reactivity with the phospho-peptide by ELISA assay. 
Antibodies reactive with corresponding non-phosphorylated peptide were 
removed from anti-sera by passing the anti-sera through a column having the 
non-phosphorylated peptide bound to the column matrix. Finally, anti-sera were 
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enriched for phospho-specific reactivity by use of an affinity column made from 
the phospho-peptide. 

The specificity of the antibody for SHP-1 pS591 was confirmed by 
Western blot analysis (FIG. 43). When the anti-SHP-1 pS591 antibody was used 
5 at a dilution of 1 : 1 5,000, only a single strong band was detected on a Western 
blot of a lysate of Jurkat cells. The position of this band was characteristic of 
SHP-1. In contrast, the pPKC antibody bound to many bands. Binding of the 
anti-SHP-1 pS591 phospho-antibody depended entirely on S591 because no such 
binding was detected in lysates of cells that expressed the SHP-1 S591A mutant 

10 (co-transfected with constitutively-active PKC-theta). Thus, unlike the pPKC 
antibody, this anti-pS591 antibody had narrow specificity and was sufficiently 
specific for detection of only SHP-1 S591 phosphorylation. Prior 
immunoprecipitation of SHP-1 was not needed when the anti-pS591 antibody 
was employed. The strong reactivity of this antibody with phosphorylated SHP- 

1 5 1 facilitated demonstration that CD3 cross-linking stimulates phosphorylation of 
SHP-1 both in the cultured cell line JURKAT cells and in normal mouse 
thymocytes. 

PKC inhibitors were then used to further confirm that PKC mediates 
CD3/2-induced phosphorylation of SHP-1 (FIG. 44). Jurkat cells were 
20 stimulated with CD3/CD28 after pre-treatment with graded concentrations of 
two PKC inhibitors: BIM I and BIM DDL As shown in FIG. 44, SHP-1 
phosphorylation was reduced by 1 micromolar concentrations of BIM I and BIM 
m and was virtually abolished at BIM I and BIM HI concentrations of 5 
micromolar. 

25 The specificity of the anti-SHP-1 pS591 antibody was also demonstrated 

by in situ immunofluorescence studies (FIG. 45). Experiments were conducted 
with a wildtype and S591A constructs of SHP-1 N-terminally tagged with the 
fluorescent marker GFP. These constructs were transfected into 293T cells, the 
cells were then cultured for 24hr, fixed, permeabilized, and stained. 

30 Immunofluorescent staining for SHP- 1 phosphorylation was performed by 

incubating cells first with rabbit anti-pS591 and subsequently with an anti-rabbit 
antibody linked to the Alexa 568 fluorophore. FIG. 45 shows staining by anti- 
pS591 antibodies of cells transfected with wild type SHP-1 but not of cells 
transfected with S591A SHP-1. 
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Further investigation of the subcellular localization of SHP-1 in Jurkat 
cells indicates that phosphorylation regulates the ability of SHP-1 to translocate 
into the nucleus. FIG. 46 illustrates that C-terminally GFP-tagged SHP-1 (seen 
as a light stain, green in the original) was located primarily in the nucleus. The 
5 S591A mutant of SHP-1 was also detected in the nucleus, but the S591D mutant 
was largely excluded from the nucleus. The change in SHP-1 of S591 to D591 
mimics phosphorylation at residue 591, and caused exclusion from the nucleus. 
Moreover, in 293T cells co-transfected with SHP-1 and constitutively active 
PKC-theta (which causes phosphorylation of SHP-1 S591, see FIG. 43) results 

10 in exclusion of SHP-1 from the nucleus. However, incubation of SHP-1/PKC- 
theta expressing cells with the PKC inhibitor BIM I causes the SHP-1 to become 
localized within the nuclei (FIG. 46B). Also, as shown in FIG. 46C, the ability 
of PKC-theta to cause exclusion of SHP-1 from the nucleus is destroyed by 
mutation of S591 to alanine (A). Thus, multiple lines of evidence indicated that 

1 5 phosphorylation of S591 cause exclusion of SHP-1 from the nucleus. 

EXAMPLE 11: Additional examples of proteins predicted to have good 
PKC phosphorylation sites and found to bind pPKC antibody by Western 

blot 

20 The predictive power of the methods of the invention is further illustrated 

in this Example by studies of the proteins LIMK-2 and MLK3. LIMK-2 and 
MLK3 were identified as promising candidates for phosphorylation by PKC 
based on predictions for PKC-theta described herein and confirmation of that 
prediction by in vitro peptide phosphorylation (SEQ ID NO: 76 in Table 4 and 

25 SEQ ID NO: 121 in Table 5). 

In vitro binding experiments were perfonned to determine whether the 
pPKC Ab bound to predicted phosphorylated sites in MLK3 and LIMK2. 
Synthetic peptides chosen from those shown in Table 4 were subjected to 
phosphorylation by PKC-theta. Assay conditions were similar to those 

30 described herein, except that the phosphorylation reaction was for 30 minutes at 
30 °C and then overnight at 4 °C. The reaction mixture was applied to HB 
avidin-coated plates, the plates washed, and then pPKC Ab binding was 
determined. The results of these assays are summarized in Table 8. 



105 



WO 2005/028666 



PCT/US2004/029397 



TABLE 8. The pPKC Antibody binds to peptides 
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As shown in Table 8, the pPKC Ab bound to peptides from LIMK-2 and from 
5 MLK3 after phosphorylation but not before. Results for a control peptide (ROC 
K2) are also shown; the ROCK2 peptide is not phosphorylated by PKC and 
shows no change in binding to pPKC Ab after the peptide was exposed to PKC- 
theta. 

The question of in vivo relevance of LIMK-2 phosphorylation was 
1 0 addressed using the strategy used above for SHP- 1 . LIMK-2 was 

immunoprecipitated with anti-LIMK2 antibody H-78 purchased from Santa Cruz 
Biotechnologies, separated by one-dimensional PAGE and analyzed by Western 
blot. The Western blot shown in FIG. 33 illustrates that LIMK-2 was 
immunoprecipitated from T-lymphocytes before and after T-cell receptor 
1 5 stimulation and the pPKC antibody bound to LIMK-2, indicating 

phosphorylation of LIMK-2. Note that the pPKC signal was observed only on 
the sample from T-cell receptor stimulated cells, indicating that phosphorylation 
of LIMK-2, as detected by the pPKC antibody, occuired during T-cell receptor 
stimulation. 

20 Similar studies were performed with the MLK3 protein. Jurkat T Ag cells 

(10 million) were stimulated with CD3 (clone 38.1, IgM ascites, 1:1000 Final) 
plus CD28 (clone 9.3, sup, 1:1000 final), or withPMA (200ng/ml) for 5 minutes. 
MLK3 was immunoprecipitated from the cell lysate with anti-MLK3 Ab (H-300; 
from Santa Cruz) and protein G beads. The immunoprecipitated MLK3 was 
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subjected to western blotting and one blot was probed with the pPKC Ab while 
another blot was probed with the MLK3 Ab. As shown in FIG. 34, MLK3 has 
strong reactivity with the pPKC antibody both before and after stimulation of 
JURKAT cells. The predicted phosphorylation site at Ser-477 on MLK3 
corresponds to one of the very best detected in the entire human proteome, and 
the JURKAT cell line is a partially activated transformed cell line. The binding 
of pPKC antibody therefore likely reflects phosphorylation of MLK3 that is 
present even in unstimulated cells. 

EXAMPLE 12: Evaluation of best positions for arginine and phenylalanine 
in an RF-pair peptide set for PKC-theta phosphorylation 

Example 9 introduced the idea of "Optimal Residue Position Scanning" 
(ORPS) using pairs of R residues at all possible positions near PO. This 
Example further illustrates the ORPS approach including the design, synthesis 
and testing of a set of degenerate peptides in which a single arginine and a single 
hydrophobic (phenylalanine) residue are the only two fixed residues near a 
phosphorylatable residue (S at PO). Arginine was chosen for this analysis 
because of its importance to basophilic kinases. A hydrophobic residue was 
chosen as the second residue because a synthesis of the scientific literature 
indicated that one or a few hydrophobic residues are often important 
determinants of the specificity of multiple kinases. For example, several PKCs 
have an apparent preference for a hydrophobic residue at P+l . While a variety 
of hydrophobic residues exist, including, for example, phenylalanine or leucine 
or a mixture of several residues (such as isoleucine, leucine, methionine, valine 
and/or phenylalanine), for this proof of principle a single hydrophobic residue 
(F) was selected to maximize informative design consistency between this set 
and the RxxSF set. 

Design details for the RF-pair set are illustrated in FIG 36. As in other 
peptide sets, each peptide consisted of an N-tenninal linker (biotin-dansylated 
lysine and glycine) followed by a 13 residue insert. The insert consisted of a 
fixed serine residue flanked by eight N-tenninal residues and four C-terminal 
residues. Each peptide had a single R at a position ranging between P-7 to P44 
and a single F at another position ranging between P-7 and P+3. The symbolic 
representation of two such peptides is shown in FIG 36. Altogether the peptide 
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set included all possible combinations of R and F at positions between P-7 to 
P+3 (excluding P0). 

The specificity of PKC theta for various peptides was assessed using 
PKC-theta phosphorylation reactions with peptides of the set then calculating log 
5 scores as described above for the R-pair set In FIG, 36, scores showing 
distinctly favored phosphorylation (>0.5) are highlighted with bold and 
underlined while those showing distinctly disfavored phosphorylation (<-0.5) are 
bold but not underlined. Visual inspection of the results indicates underlying 
patterns. The position most favored for R is P-2 because 7 of 9 peptides in that 

10 column are distinctly favored. The P-3 position is also favored for R (4 of 9 
peptides distinctly favored). The position P+l is clearly most favored for F 
because 8 of 10 peptides in that row are distinctiy favored. 

An alternate way to assess residue preference at a position is by 
determining the average score for all peptides sharing that residue at that 

1 5 position. Those values are shown ^n the right hand column and the bottom row 
of FIG. 36. FIG. 37 A provides a graph of the average position-specific 
preferences of PKC-theta. As shown in FIG. 37, analysis of the RF pair set 
indicates that P-2 is the preferred position for R and P+l the preferred position 
for F. These results for arginine are similar to those obtained in Example 9 for 

20 arginine alone. Thus, analysis of PKC-theta with the R-pair set (FIG. 37B) also 
indicates that the P-2 is the single most important position for an R residue in 
PKC substrates. 1 

As indicated in previous Examples, analysis of PKC-theta with the 
RxxSF set of peptides was quite informative. It seems likely that analysis of 

25 peptide specificity will be even more informative when "systematic amino acid 
variation on template substrate" (SAaVoTS) is used to design better peptide sets 
(e.g. RxSF). Thus, the R-pair and RF-pair sets serve the critical purpose of 
objectively determining what are good residue choices for positional scanning 
approaches (SAaVoTS). (See also Example 14). 

30 FIG 3 8 shows the distribution of log2Scores for the PKC-theta with the 

RF-pair set, sorted from highest to lowest scores. As shown in FIG. 38, there are 
4-7 peptides that are distinctly superior in their phosphorylation, rather than a 
single peptide in the RF-pair set that is exceptionally well phosphorylated. This 
is consistent with complex additive or alternative modes of binding of substrate. 
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If particularly high resolution analysis of specificity of PKC-theta is required, 
then analysis with S AaVoTS sets based on several of these RF-pair peptides is 
likely to provide additional information. 

5 EXAMPLE 13: Analysis of kinases with a "diverse basic proteomic set," 
which is enriched in for sequences located near the N- and C-termini of 
proteins. 

Although degenerate peptides are particularly useful for studying kinase 
peptide specificity, strategic use of non-degenerate peptides can also be 

10 effective. Thus, a set of 96 peptides with defined sequences was designed and 
synthesized, each comprised of a preferred N-terminal linker and a 17 residue 
insert (Table 9). The inserts were chosen by the following criteria. First, only 
sequences from human proteome were selected. Second, peptide choice was 
biased towards sequences that basophilic kinases favor for phosphorylation, 

1 5 especially PKC-theta, using the prediction methods described herein. 

Consequently the sequences were enriched in basic residues: R was enriched in 
the peptides to an abundance of 19.3%, more than three-fold higher than that 
observed in the human proteome (about 6%); and K was enriched in the peptides 
of the set to 12.3%, more than two-fold higher than observed in the human 

20 proteome. Moreover, 80% of the peptides were in the top 5 percentiles for 
predicted phosphorylation by PKC-theta. Third, the diversity of the peptides 
was enhanced by manually selecting sequences having diverse residues at 
positions strongly biased by the PKC preference (especially diversity at the P-2, 
P-3, P-4) positions. Fourth, the set was enriched for peptides corresponding to 

25 proteins that are well expressed in hematopoietic cells so that findings would be, 
most relevant to the inventor's field of interest. Fifth, the peptide set was 
enriched for sequences at or near the C-terminus of the protein (46 of the 96 
peptides) and the N-terminus (5 of 96 peptides). This choice to emphasize C- 
and N-terminal peptides was made based on the knowledge that sequences near 

30 the termini of proteins are the most mostly likely to be available for interactions 
with other proteins. Although the accessibility of protein termini is best know in 
the context of protein immunization/detection with antibody, the data illustrated 
herein indicate that the same principle applies to the accessibility of termini for 
interactions with other proteins (such as kinases). Moreover, there is 
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experimental evidence that basic residues at the C-termini have special 
importance to protein function (Scheglmann D, Werner K, Eiselt G, Klinger R. 
2002. Protein Eng 15:521-528). Sixth, the peptide set was enriched for 
sequences that were not strongly hydrophobic (see Table 9 column 
5 "hydrophobic; the hydrophobicity scores for individual residues are shown in 
FIG. 14). The mean hydrophobicity of peptide sequences from the human 
proteome that have 17 residues is about 0.34, while the mean hydrophobicity of 
the 96 peptides in Table 9 was in the fifth percentile for the proteome (< -0.07). 
The selection of hydrophilic peptides further enhanced the likelihood that these 
10 sites would be accessible for phosphorylation and functional interaction in native 
proteins. 
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This set of peptides is very useful for identifying new sites for basophilic 
kinases because the set has many potential phosphorylation sites (total=3 10) and 
5 the set diversely represents many patterns of residues, including basic residues 
around the phosphorylation sites. Seventy-six of its peptides (79%) include 2 
arginines within 6 positions of a Ser/Thr, and 56 of its peptides (58%) include 3 
arginines within 6 positions of a Ser/Thr. This is much higher than the fequency 
of these patterns in 17-residue peptides in the human proteome, which is 18% for 
1 0 2 aiginines (4 fold lower than in this set) and 5% for 3 arginines (12 fold lower). 
Thus, the probability of assembling a peptide set with 4 fold higher abundance of 
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this pattern by chance alone is vanishingly small, even for a set of only 10 
peptides, much less a set of 96. Hence, the usefulness of this set is related to the 
purposeful enrichment of arginines in diverse positions near the Ser/Th 
phosphorylation site. 

5 Table 9 also tabulates results from phosphorylating this panel of peptides 

with 5 different kinases. Phosphorylation results for each peptide are expressed 
as percentage of phosphorylation of the best substrate by the same kinase. The 
kinases AKT1, PAK1 and MST4 were purchased from Cell Signaling 
Technology and assayed according to the protocol provided by the manufacturer 
10 ProQinase. 

Table 9 illustrates that a high frequency of peptides are phosphorylated 
by PKC-theta (50 out of 96) and to a lesser extent PKC-zeta (27 out of 96). 
These results are not surprising based on the selection of peptides with sites 
having scores in the top 5 percentile for PKC-theta. 

1 5 One useful finding was that many peptides (i.e. more than ten) were 

phosphorylated by two basophilic kinases AKT1 and PAK1 , even though the 
peptides in this set were not specifically selected to provide substrates for those 
kinases. Thus, the intentional selection of a diverse distribution of arginines 
around the phosphorylation site provided an enriched set of peptides that 

20 effectively acted as substrates for these kinases. For example, AKT1 
phosphorylated 13/96 peptides but only one peptide (from GSK-3) was 
intentionally chosen as a control for AKT1 phosphorylation. Similarly, PAK1 
phosphorylated 16/96 peptides. 

Of particular note, six peptides were substrates for the kinase MST4, 

25 which was previously not known to be basophilic. Ongoing analysis using the 
approaches described herein indicates that MST4 is basophilic and prefers basic 
residues at positions P+4 to P+6 (data not shown). These newly identified 
peptide substrates are useful for development of better in vitro kinase assays. 
This is particularly true for MST4, because a good peptide substrate has not yet 

30 been identified for MST4. 

Importantly, the peptide set of Table 9 constitutes likely candidates for in 
vivo phosphorylation in native proteins in vivo because these sites are located 
near protein termini. 
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This "diverse basic proteomic set" can also be useful in analysis of 
residue preference of basophilic kinases, as included in Example 14 below 

EXAMPLE 14: Analysis of a kinase whose specificity is poorly defined 
with the RF-pair, the R-pair and the diverse basophilic proteomic set. 

This Example illustrates the specificity of PAK1, as proof of principle 
that the inventive methods enable better characterization of a basophilic kinase 
whose specificity was previously incompletely defined. PAK1 belongs to the 
STE20 family of Ser/Thr kinase. 

FIG. 39 shows the analysis of PAK1 with the R-pair set These results 
illustrate the singular and consistent importance of R at the P-2 position to PAK 
phosphorylation. 

FIG. 40 shows analysis of PAK1 with the RF-pair set The analysis of 
average preference from this set also strongly affiims the singular importance of 
R at P-2; and also indicates a modest average preference for F at P-l , P+l and 
P+3. Looking at the results for individual peptides in the set, it is apparent that 
each of the peptides RRxS, RRS, RFS, RRxxS and RxSxxF are each strongly 
favored Thus, each of these peptide sets could be used as the basic for a 
SAaVoTS degenerate set for more detailed analysis of PAK specificity. 

Analysis of PAK with the "diverse basic proteomic set" proved to be 
informative. Table 9 includes a tabulation of the results of phosphorylation of 
that peptide set by PAK. Underlying sequences patterns were analyzed to 
differentiate between substrates motifs that are phosphorylated (i.e. >10% of the 
best substrate) and those that are poorly phosphorylated (< 10% of the best 
substrate). The most informative results for PAK demonstrate that R at position 
P-2 is singularly important for phosphorylation of peptides in this set by PAK 
(FIG. 41). FIG. 41 A shows the procedure for a chi-square analysis to determine 
whether arginine at position P-3 (relative to a phosphorylation site) contributes 
to phosphorylation of the 16 positively phosphorylated peptides. FIG. 41A 
tabulates the results: 10 of the phosphorylated peptides have arginine at position 
P-3 while 6 do not; 45 of the non-phosphorylated peptides have arginine at 
position P-3 and 35 do not. The bottom half of FIG. 41 A shows the calculation 
of expected distribution of peptides if the R at P-3 and the phosphorylation are 
independent of each other. The bottom row tabulates the probability (from a chi- 
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square test) that the R at P-3 is correlated with phosphorylation. In the case of R 
at P-3, there is no significance to the correlation (p~-0.6). In the case of R at P-2, 
the probability is very significant (pO.OOOl). All of the 16 phosphorylated 
peptides comprise a site with R at P-2 relative to an S or T (shown in FIG. 41B); 
5 in contrast less than half of the non-phosphorylated peptides have that pattern. 
FIG. 41C shows the p-values for analysis of R at all positions between P-6 and 
P+3; the results demonstrate that R at P-2 is unique in its importance. 

Thus the R-pair analysis, the RF-pair analysis and analysis with the 
"diverse basic proteomic sef 9 each show that the P-2 position occupies a place of 

10 dominant importance in determining kinase specificity. The consistency 
between these independent approaches is strong evidence for their validity as 
well as for the validity of the finding that R at P-2 is unusually important to 
PAK. It is notable that the approaches provided herein provide more precisely 
define kinase sequence specificity regarding this most critical location of basic 

15 residues than is provided by previous workers (Tuazon PT, Spanos WC, Gump 
EL, Monnig CA, Txaugh JA ' 1997. Biochemistry 36: 16059-16064). 

All patents and publications referenced or mentioned herein are 
indicative of the levels of skill of those skilled in the art to which the invention 

20 pertains, and each such referenced patent or publication is hereby incorporated 
by reference to the same extent as if it had been incorporated by reference in its 
entirety individually or set forth herein in its entirety. Applicants reserve the 
right to physically incorporate into this specification any and all materials and 
information from any such cited patents or publications. 

25 The specific methods and compositions described herein are 

representative of preferred embodiments and are exemplary and not intended as 
limitations on the scope of the invention. Other objects, aspects, and 
embodiments will occur to those skilled in the art upon consideration of this 
specification, and are encompassed within the spirit of the invention as defined 

30 by the scope of the claims. It will be readily apparent to one skilled in the art 
that varying substitutions and modifications may be made to the invention 
disclosed herein without departing from the scope and spirit of the invention. 
The invention illustratively described herein suitably may be practiced in the 
absence of any element or elements, or limitation or limitations, which is not 
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specifically disclosed herein as essential. The methods and processes 
illustratively described herein suitably may be practiced in differing orders of 
steps, and that they are not necessarily restricted to the orders of steps indicated 
herein or in the claims. As used herein and in the appended claims, the singular 
5 forms "a," "an," and "the" include plural reference unless the context clearly 
dictates otherwise. Thus, for example, a reference to "an antibody" includes a 
plurality (for example, a solution of antibodies or a series of antibody 
preparations) of such antibodies, and so forth. Under no circumstances may the 
patent be interpreted to be limited to the specific examples or embodiments or 
10 methods specifically disclosed herein. Under no circumstances may the patent 
be interpreted to be limited by any statement made by any Examiner or any other 
official or employee of the Patent and Trademark Office unless such statement is 
specifically and without qualification or reservation expressly adopted in a 
responsive writing by Applicants. 
1 5 The terms and expressions that have been employed are used as terms of 

description and not of limitation, and there is no intent in the use of such terms 
and expressions to exclude any equivalent of the features shown and described 
or portions thereof, but it is recognized that various modifications are possible 
within the scope of the invention as claimed. Thus, it will be understood that 
20 although the present invention has been specifically disclosed by preferred 

embodiments and optional features, modification and variation of the concepts 
herein disclosed may be resorted to by those skilled in the art, and that such 
modifications and variations are considered to be within the scope of this 
invention as defined by the appended claims. 
25 The invention has been described broadly and generically herein. Each 

of the narrower species and subgeneric groupings falling within the generic 
disclosure also form part of the invention. This includes the generic description 
of the invention with a proviso or negative limitation removing any subject 
matter from the genus, regardless of whether or not the excised material is 
30 specifically recited herein. 
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WHAT IS CLAIMED: 

1 . A test set for characterizing substrate specificities of kinases comprising 
at least two peptide pools, wherein substantially every peptide in each of 
the peptide pools comprises one phosphorylatable amino acid position, 

5 one query amino acid position, and at least one degenerate amino acid 

position, and wherein: 

(a) each peptide of every peptide pool has an identical 

phosphorylatable amino acid that can be phosphorylated by a 
kinase at the phosphorylatable amino acid position; 
10 (b) the query amino acid position is at a defined position relative 

to the phosphorylatable amino acid position within every 
peptide of every peptide pool but a query amino acid's 
identity at the query amino acid position is systematically 
varied from one peptide pool to the next peptide pool within 
15 the test set of peptide pools; 

(c) each degenerate amino acid position within every peptide of 
every peptide pool is occupied by an amino acid selected from 
a defined mixture of amino acids; and 

(d) the query amino acid position is not adjacent to the 

20 phosphorylatable amino acid position in any peptide pool of 

the test set. 

2. The test set of claim 1, wherein at least one degenerate position in each 
peptide pool in the test set is occupied by a defined mixture of more than 

25 five amino acids. 

3. The test set of claim 1 , wherein the defined mixture comprises all natural 
amino acids except cysteine. 

4. The test set of claim 1 , wherein each amino acid's relative abundance in 
the defined mixture is approximately that amino acid's relative 

3 0 abundance in the human proteome. 

5. The test set of claim 1, wherein the defined mixture of amino acids 
comprises arginine. 

6. The test set of claim 1 , wherein the test set has at least four peptide pools 
and each of the four peptide pools have a different query amino acid. 
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7. The test set of claim 1, wherein the query amino acid position is two 
positions N-tenninal to the phosphorylatable amino acid position. 

8 . The test set of claim 1 , wherein the query amino acid position is two 
positions CD-terminal to the phosphorylatable amino acid position. 

5 9. The test set of claim 1 , wherein one query amino acid is arginine. 

10. The test set of claim 1, wherein each peptide pool is a soluble mixture of 
peptides. 

1 1 . The test set of claim 1 , wherein substantially every peptide in each 
peptide pool is linked to biotin. 

10 12. The test set of claim 1, wherein substantially every peptide in each 
peptide pool is attached to a solid support 

13. The test set of claim 1 which also comprises at least one anchor amino 
acid position, and wherein: 

(a) each anchor amino acid position is at a defined position 
1 5 relative to the phosphorylatable amino acid position 

within every peptide of every peptide pool and each 
anchor amino acid position has an identical anchor amino 
acid at that anchor amino acid position within every 
peptide of every peptide pool; and 
20 (b) the query amino acid position is not adjacent to an anchor 

amino acid position in any peptide pool of the test set. 

14. The test set of claim 13, wherein at least one anchor amino acid is 
arginine. 

15. The test set of claim 13, wherein an anchor amino acid position is located 
25 one position C-tenninal to the phosphorylatable amino acid position. 

1 6. The test set of claim 1 3, wherein an anchor amino acid position is located 
three positions N-terminal to the phosphorylatable amino acid position. 

17. The test set of claim 16, wherein arginine is the anchor amino acid at the 
anchor amino acid position located three positions N-terminal to the 

30 phosphorylatable amino acid position. 

18. The test set of claim 13, wherein every peptide in each of the peptide 
pools comprises less than four anchor amino acids 

19. A test set for characterizing substrate specificities of kinases comprising 
at least two peptide pools, wherein every peptide in each of the peptide 
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pools comprises one phosphorylatable amino acid position, one query 
amino acid, and at least one degenerate amino acid position, and wherein: 

(a) each peptide of every peptide pool has an identical 
phosphorylatable amino acid that can be phosphorylated 

5 by a kinase at the phosphorylatable amino acid position; 

(b) every peptide of every peptide pool has an identical query 
amino acid but the position of the query amino acid 
relative to the phosphorylatable amino acid position is 
systematically varied from one peptide pool to the next 

10 peptide pool within the test set of peptide pools; and 

(c) each degenerate amino acid position within every peptide 
of every peptide pool is occupied by an amino acid from a 
defined mixture of amino acids. 

20. The test set of claim 19, wherein the query amino acid is arginine. 

15 21. The test set of claim 1 9, wherein each peptide of every peptide pool has 
at least one anchor amino acid position that is at a defined position 
relative to the phosphorylatable amino acid position, and wherein each 
anchor amino acid position of peptides within a peptide pool has an 
identical anchor amino acid at that anchor amino acid position. 

20 22. The test set of claim 2 1 , wherein the anchor amino acid is arginine and 
the anchor amino acid position is two positions N-terminal to the 
phosphorylatable amino acid position. 

23 . A test set of peptides for characterizing kinase substrate specificity 
comprising at least 50 separate peptides, each peptide consisting 

25 essentially of a sequence of between 6 and 30 amino acids, wherein each 

peptide sequence is different from every other peptide sequence, and 
wherein at least 50 peptides comprise two or more arginines within 6 
amino acid positions of a serine or threonine. 

24. The test set of claim of 23, wherein the test set has at least 96 separate 
30 peptides that comprise two or more arginines within 6 amino acid 

positions of a serine or threonine. 

25. The test set of claim of 23, wherein at least half of the peptides comprise 
two or more arginines within 6 residues of a serine or threonine. 
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26. The test set of claim of 23, wherein at least 50 peptides comprise two or 
more arginines but two of said arginines are not located 3 positions N- 
terminal to the serine or threonine. 

27. The test set of claim of 23, wherein at least 50 peptides comprise three or 
5 more arginine residues within 6 residues of a serine or threonine. 

28. The test set of claim of 23, wherein the at least 50 peptides further 
comprise one or more lysine residues within 6 residues of a serine or 
threonine. 

29. The test set of claim of 23, wherein substantially every peptide in the set 
10 corresponds to a peptidyl sequence in a mammalian protein and the 

peptidyl sequence is within 30 amino acids of the protein's N-terminus or 
C-terminus 

30. A peptide set consisting essentially of two or more pools of peptides, 
wherein each pool comprises peptides having substantially identical 

15 peptide sequences and the peptide sequences in each pool are selected 

from the group consisting essentially of SEQ ID NO: 76, 81, 82, 87, 89- 
92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 121, 124, 127-129, 131- 
134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173-177, 179, 182- 
192, 196-206, 208-211, 213-216, 474-516 or 517. 

20 31. An isolated peptide consisting essentially of SEQ ID NO:76, 81 , 82, 87, 
89-92, 94, 97-99, 102, 104, 105, 108, 110, 112, 113, 121, 124, 127-129, 
131-134, 136, 139, 143, 144, 149, 151-154, 160, 163-171, 173-177, 179, 
182-192, 196-206, 208-211, 213-216, 474-516 or 517. 

32. The peptide of claim 31, wherein a serine or threonine in the peptide is 
25 phosphorylated. 

33. A binding entity whose binding differentiates between a peptide having 
any one of SEQ ID NO:76, 81, 82, 87, 89-92, 94, 97-99, 102, 104, 105, 
108, 110, 112, 113, 121, 124, 127-129, 131-134, 136, 139, 143, 144, 149, 
151-154, 160, 163-171, 173-177, 179, 182-192, 196-206, 208-211,213- 

30 216, 474-5 1 7, and the peptide after phosphorylation by protein kinase C 

theta; wherein the binding entity has substantially no binding to a 
phosphorylated peptide having SEQ ID NO: 229 (WKN-pS-IRH). 
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34. The binding entity of claim 33, wherein the binding entity binds with 
greater affinity to the peptide after phosphorylation than before 
phosphorylation. 

35. The binding entity of claim 33, wherein the binding entity binds with 
5 greater affinity to the peptide before phosphorylation than after 

phosphoiylatioa 

36. The binding entity of claim 33, wherein the binding entity is an antibody, 
an antibody fragment or a mixture thereof. 

37. The binding entity of claim 33, wherein the peptide is part of a 
10 mammalian protein. 

38. The binding entity of claim 37, wherein the peptide's sequence is within 
30 amino acids of the protein's N-terminus or C-terminus of said protein. 

i 39. The binding entity of claim 38, wherein the peptide comprises any one of 
SEQIDNO: 89, 102, 110, 112, 127, 177, 182, 209, 474-488 or 489. 

1 5 40. The binding entity of claim 38 where the peptide comprises any one of 
SEQ ID NO: 173, 185, 192, 196, 200, 490-491 or 492. 

41. The binding entity of claim 33 whose binding further differentiates 
between a phosphorylated peptide having any one of SEQ ID NO: 298, 
301-324,326-347, 349-400, 402-410, 412-473, 571-643 or 644, and a 

20 non-phosphorylated peptide that differs from the phosphorylated peptide 

by substitution of Ser for the pSer or substitution of a Thr for the pThr. 

42. The binding entity of claim 41, wherein the phosphorylated peptide 
comprises any one of SEQ ID: 298, 320, 324, 350, 351, 366, 388, 394, 
398, 402, 418, 464, 571-595 or 596. 

25 43. The binding entity of claim 41, wherein the phosphorylated peptide 

comprises anyone of SEQ ID: 301, 310, 317, 322, 344, 352, 371, 406, 
597-599 or 600. 

44. The binding entity of claim 4 1 , wherein the phosphorylated peptide 
comprises SEQ ID NO:298. 
30 45. The binding entity of claim 41, wherein the phosphorylated peptide 
comprises SEQ ID NO:313 or 314. 

46. The binding entity of claim 41, wherein the phosphorylated peptide 
comprises SEQ ID NO:361 or 362. 

47. A method for characterizing substrate specificities of kinases comprising: 
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(a) contacting each peptide pool in at least two test sets of 
peptide pools with ATP and a kinase; 

(b) quantifying the amount of phosphorylation in each peptide 
pool; and 

(c) comparing the amount of phosphorylation in each peptide 
pool with the amount of phosphorylation in at least one 
other peptide pool; 

wherein substantially every peptide in each of the peptide pools 
comprises one phosphorylatable amino acid position, one query 
amino acid position, and at least one degenerate amino acid 
position; and wherein 

each peptide of every peptide pool has an identical 
phosphoiylatable amino acid that can be phosphoiylated by a 
kinase at the phosphoiylatable amino acid position; 

the query amino acid position is at a defined position relative tq the 
phosphorylatable amino acid position within every peptide of 
every peptide pool but a query amino acid's identity at the query 
amino acid position is systematically varied from one peptide 
pool to the next peptide pool within the test set of peptide pools; 
and 

each degenerate amino acid position within every peptide of every 
peptide pool is occupied by an amino acid from a defined mixture 
of amino acids. 
The method of claim 47, wherein quantifying the amount of 
phosphorylation comprises determining a total amount of labeled 
phosphate incorporated into each peptide pool. 
The method of claim 47, wherein quantifying the amount of 
phosphorylation comprises determining a total amount of phosphoiylated 
peptide in each peptide pool with an antibody specific for a 
phosphoiylated peptide. 

A method for visual display of amino acid or nucleotide sequence 
preferences comprising a series of stacks of single letter symbols for 
amino acids or nucleotides, wherein 
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(a) each stack represents a position in a peptide or a nucleic 
acid sequence; 

(b) each symbol's height is proportional to the absolute value 
of a quantitative parameter that is positive for favored 
amino acids or nucleotides and negative for disfavored 
amino acids or nucleotides; and 

(c) each symbol's position within the stack is sorted from 
bottom to top in ascending value by the quantitative 
parameter. 



125 



WO 2005/028666 



PCT/US2004/029397 



0 



CL 
0 



O 
0 



03 
0 



co O 

0 LJ - 



O 



_0 CD 
Q_ *** 

LU 



D) 

■ MM 

LL 



CD 
CO 

-4— » 

CO 

0 

CM 
+ 
CL 

GO 



0 
CO 

CO 
CD 



+ 
CL 









00 








in 


CO 


in 


CO 






T- 










ore 


© 
d 


o 
d 


© 
d 


o 
d 


o 

d 


o 
d 


o 
d 


o 
d 


o 
d 


O 

d 


q 
d 


o 
d 


O 

d 






o 

rn 
w/ 


+! 


■H 


+i 




-H 


+i 


■H 


+i 


+i 




+i 


■H 


■H 






O) 




























"D 




0 
-J 




SI 


CO 


col col CO 


t- 


10 


CO 




CN 




q 


-F? 






1 




o 






ol 


d 


T- 
1 


o o 


d 


v- 

1 


TJ" 




5 


































c 






























0 


cs 


lO 


M 


CD 


TH 


LO 


N 






CN 


CO 




m 


in 


2 

■o 


Rati 


o 


o 


^1 


O 


CO 


CN 




T" 


o 


T- 




r- 


o 


d 


TJ 

cm"° 








CM 




CO 


m 


in 


o 






o> 








£c 




s 

a. 




CN 




CO 


CN 


CO 


t" 


CO 


m 


T- 


CO 


CM 




c .2 




+1 


•H 


+1 


+1 


+i 


■H 


■H 


4-1 


-H 


+1 


•H 


+\ 


*H 


0 5 




o 


CO 


col 


O) 


CO 


I s * 


CO 




00 


O) 


CO 


CD 


in 




2 B 






00 


O) 


CN 




CM 


O) 


00 




CO 




LO 


o> 


o 


(0 c 






r 


<oi 


CN 




O 


CO 








s 






CM 


O 0 












T- 
















ft (0 
































o 
































































a 

© 




o 
o 


Tl 


•o 




TJ 


TJ 


TJ 


TJ 


TJ 


TJ 


TJ 


TJ 


T> 


TJ 


or 




c 

CD 


Q 


Z 


a 


Qd 




X 


CO 


CL 


-1 


LL. 




CD 


< 


o 




3 


(n 


Cn 




to 


to 




Em 


to 


to 


to 


to 


to 


to 


bol 




O" 

CD 


1 

10 


1 

CO 


i 

to 
1 


1 

CO 

1 


CO 


1 

CO 

i 


\ 

CO 

1 


i 

w 
i 


I 

CO 

i 


1 

CO 

1 


l 

CO 

I 


CO 


I 

CO 

1 


1 




CO 


•d 


TJ 




























o 




























CO 




eptid 






1 

TJ 


! 


pppp 




pppp 


PPPP 


1 

TJ 


PPPP 


1 


PPPP 


1 

TJ 






a 
































t- 


CN 


CO 




CO 




CO 




o 




CM 


CO 



<v 

CO 

i 

TJ 
•D 

or 
■o 

+ 

0- e 

c .2 
o 

S 2 
» ? 

O CD 
n CO 
CD 
w 

a 

CD 
(XL 



O 
JQ 

E 



CO 



5 c 

■2| 



CD 

a 
c 
a 

3 
XT 

CD 
CO 

o 
■o 

't 

CD 
0. 



•H 



■H 



c 1 5 q q| co | q ^ co| inj 



Id 9 o oj 



CO I s * 

9 9 



CO 



(0 



"PI 



rCNCOTfWCONCOO) 



O r- CM CO 



1/45 



WO 2005/028666 



PCT/US2004/029397 



CD 

s- CO 
+ 

0 2 
TO 

"(/) CL 
CD 

i2 2 

C/) >> 

to o 

CD ±= 

o 2. 

co O 

ClQ. 

CO O 
< 



_8 

2 3 
o 2 

93 

v or 




as 




i i i i i i i i i i i i i 

nnnnonaioonntsn 





ii. 

u n w n 




U 10 M » 





II 



2/45 



WO 2005/028666 



PCT/US2004/029397 



CO 

+ 



CM 
+ 



CO 

to 

M 



CD 



CO 
CO 



CO 
00 



to 

CO 



tO| 
CO 

to 



CO 
CM 
CM 



to 



CO 
CO 



CO 

in 



CO 
CM 



CM 
CO 
00 



CM 

© 



to 
o 



CD 
O 
lO 



to 

CO 

to 



CO 
CD 



o 

CO 



CO 



CD 
CO 



to 

CM 
CM 



00 



in 



cd 

CO 



o 

CO 



3 

CM 



CO 



CO 
CO 



to 
o 



CO 



■a 

£ 
o 

i 

>» 

1§ 
•*-» 
C 

■S 

<o 

JO 
3 

to 



o 



CD 



CM 
O 



O 

CO 



CO 
CM 



CO 
CO 



CM 



to 



toi 

CO 



CD 



CO 



CD 
CO 



CO 



CO 



c 

3 

8 

i 



■o 

TO 

c 
'— 

TO 
XJ 

c 

3 

o 

-Q 

£ 

TO 
CO 

TO 
3 

•o 



CM 

i 



CD 



CO 
CD 



to 



CO 
CO 



o 

CO 
CO 



CD 
CO 
CO 



CO 

m 



CO 



CD 



CN 
CO 
CO 



CO 

o 
in 



oo 



CO 
lO 



o 
o 

CM 



to 



o 

CD 



CO 
CD 



10 
CO 



ID 



I s - 
CM 



CO 
CO 



T3 

£ ai 
§:§ 

& CD 

CO 3 
C 3 

3 o 
o ° 

*+— -Q 

§ « 

o CO 
O TO 

is 



CM 



CM 
CM 



CM 
N 
O 
CM 



CO 
CO 
CO 



CO 
LO 



CO 
CO 



CD 
CO 



o 

CD 



CM 
I s - 



lO 



CM 
10 



O 
3 

"55 
© 



a 



to 



c 

w tit 

§ eo 

o o 

a> u. 

O 



3/45 



WO 2005/028666 



PCT/US2004/029397 



CO 

+ 



CM 
+ 



■ 

o 



■ 

© 



CO 



d 



CO 

■ 

o 



col 



10 

CM 



CD 



■ 

o 



in 



CO 



■ 

o 



CO 



CO 

d 



lO 



1 

o 

I 

CO 

>» 
OS 

:= 
c 

S 

CO 

c 
E 
o 



O 
CL 



lO 

■ 

o 



CN 



CM 



00 

d 



CO 

o 



mi 



CN 



00 

d 



CO 

■ 

o 



CO 
CD* 



CD 

d 



CM 
i 



CO 

■ 

o 



CO 

d 



col 



col 



CO 

d 



CD 

d 



^1 
col 



00 



CM 



00 

d 



00 

d 



O) 

d 



CD 

d 



d 



d 



CO 

i 

o 



CO 

o 



CO 

■ 

o 



CD 

d 



d 



g 

0) 
"O 

c 
o 

Q) 
u. 
CD 



3 



CO 



5f 
i 



D 

■g 

"(0 
Q) 

a: 



to 



* 

in 



io 



col 



CO 
CO 



CD 

d 



CD 

d 



d 



oo 
d 



d 



oo 
d 



CO 



CO 



CO 



CD 

d 



CO 

■ 

o 



■ 

o 



cc 



CO 



§ CD 

^» CD 

"CO ""2 

■■S § 

a j 

CO *5 

L— > 

S CD 

E 

O 13 



1 



4/45 



WO 2005/028666 



PCT/US2004/029397 



CO 

+ 



CM 
+ 



o 
Q. 



co 
o 



d 



oo 
■ 

9 



21 



to 



CD 



CO 



col 



o d 



to 



<p| 
ol 



CO 

d 



col 

■ I 

ol 



co 



CD 

■ 

o 



CO 

d 



CD 

■ 

o 



<D| 

■ 

o 



CO 

d 



CD 

o 



CO 

d 



21 



d 



21 



9 



d 



CM 



00 



in 
d 



CO 



00 



d 



CD 

d 

I 



0) 
3 

'35 

0) 



CO 



III 



pauiuuejep 
A||eju8uiuadx3 



pejeiodBjpcg 



5/45 



WO 2005/028666 



PCT/US2004/029397 



"co 

CD >, 

CO 'O 

0 ±1 

T5 O 

+3 0 

Q. Q. 



0 


CO 


Q_ 


O 


0 




03 


CL 


ner 




0 


CO 


D) 


CO 


0 


>s 


"D 


CO 


M — 

o 


an 


0 


"D 




c 


0 


0 


Q_ 


X 




0 


CO 


o 


< 






o 
o 

II 

w 8 



a a o as ts 

oi oi oi oi Ol 

O) Ol 01 A Ol 

oi oi oi oi oi 

Ol Ol Ol Ol 01 

fa fa fa fa pet 

I 1 I I I 

CO CO CD CO CO 

I I I I I 

Ol Ol Ol Ol Ol 

0) 0) Ol Ol Ol 

Ol Ol Ol Ol Ok 



53 <0 

Ol Ol 

Ol Ol 

Ol Ol 

Ol Ol 

fa fa 

i 7 

W CQ 

I I 

Ol Ol 

Ol Ol 

£ £ 

Ol Ol 

Ol Ol 



fa J 

Ol Ch 

Ol Ol 

Ol Ol 

Ol Ol 

fa fa 

I I 

CO 9} 

I I 

Ol Ol 

Ol Ol 

Ol Ol 

Ol Ol 



fa & O d 

oi a oi ol 

Ol Ol Ol Ol 

Ol Ol Ol Ol 

Ol Ol Ol Ol 

fa fa fa fa 

I I I I 

co co co n 

I I I I 

Ol 01 Ol Ol 

aaa a 

01 Ol 01 Ol 

Ol Ol 01 01 




OlOlOlOlOlOlOlOl 
Ol Ol 
CO fa 
Ol Ol 
Ol Ol 

* fa 



Ol Ol Ol Ol i 

a % a 2 

Ol Ol Ol Ol 

Ol 01 Ol Ol 

fa fa fa fa 

I I I I 

CO CO CQ CO 



Ol Ol 

fa fa 



Ol Ol 

Ol Ol 

a fa 

01 01 

Ol Ol 

fa fa 



01 Ol Ol 

gi oi oi 

Ol Ol Ol 

fa fa fa 



C0CQCOCQCOCQC0C0C0 



i Ol Ol 
i Ol Ol 

S3 

i Ol Ol 



I I I I I 



Ol Ol 

Ol Ol 

01 Ol 

Ol Ol 



Ol Ol 

Ol Ol 

03 OS 

01 Ol 

Ol Ol 



01 Ol Ol 

Ol Ol Ol 

8 S 85 

Ol Ol Ol 



Ol Ol 

Ol Ol 

fa fa 

I I 

CO CO 

I I 

Ol Ol 

« K 

Ol Ok 

Q » 

Ol Ot 

Ol Ot 

Ol Ol 



Ol Ol 

Ol Ol 

fa fa 

I I 

CO CO 



Ol Ol Ol 

Ol Ol Ol 

fa fa fa 

I I I 

CO CO CO 



Ol Ol 
Ol Ol 

fa fa 



I I I I I 

Ol Ol 01 Ol Ol 
oi ai 

Ol Ol 

a o4 

Ol Ol 



Ol Ol 
Ol Ol 



322 

• Ol Ol Ol 

• WWW 

Ol Ol Ol 
Ol Ol Ol 
Ol Ol Ol 



Ol 01 Ol Ol 

fa fa fa fa 

I I J I I I 

CO to CO W CO CO 

I I I I I I 

Ol Ol 01 Ol 

Ol 01 Ol Ol 

n # P$ (A 

Ol 01 Ol 01 

fa te o 2 

Ol 01 Ol 01 

Ol 01 Ol 01 

Ol Ol Ol 01 



Ol Ol 

Ol Ol 

« at 

Ol Ol 

fa Hi 

Ol Ol 

Ol 01 

01 Ol 



01 01 

Ol 01 

fa fa 

I I 

CO w 

Ol 01 

01 Ot 

& OS 

01 01 



Ol Ot Ol 

Ol Ol Ol 

fa fa fa 

I 1 I 

CQ CO CO 

01 Ol Ol 

Ol Ot Ot 

01 Ol Ol 



Ol Ol 01 01 Ol 



Ol Ol 
Ol Ol 

fa fa 

I I 

TT 

Ol Ol 
Ol Ol 

Ol 01 

Ol Ol 



Ol 01 

Ol Ol 

fa fa 

03 CO 

I I 

Ol Ol 

Ol Ol 

8 oi 

Ol 01 



Ol 01 Ol Ol 

Ol Ol Ol Ol 

fa fa fa fa 

I I I I 

CO (0 CO CO 

1111 

Ol Ol Ol Ol 

Ol 01 01 Ol 

ft* Pi & & 

Ol 01 Ol Ol 

Ol Ol 01 Ol 

i 01 01 01 01 



fa fa 

ft 

Ol 01 
01 01 

01 01 

Ol Ol 

Q 8 
oi a 



Ol Ol Ol 
Ol Ol Ol 

fa fa fa 
I I I 

ttt 

Ol Ol Ol 
0} G\ Ch 

Ol Ol 01 
01 01 Ol 
O! OS Ch 

o? §i oi 



Ol Ol 

Ol Ol 

fa fa 

I I 

CO CO 

I I 

Ol Ol 

Ol Ol 

oi as 

01 Ol 

01 Ol 

DO 0) 

Ol Ol 



Ol Ol Ol 

Ol Ol Ol 

fa fa fa 

I I I 

CO 01 CO 

I ( I 

Ol Ol Ol 

Cft CS OS 

Ol Ol Ol 

Ol Ol Ol 

Ol Ol Ol 

fa J fa 

Ol Ol Ol 



Ol Ol Ol 

oi os oi 

fa fa fa 

I I I 

CO CO to 

I I I 

Ol Ol OS 

s s s 

OS OS OS 

OS OS Ol 

OS OS Oj 

Ol 01 OS 



lis 



QZOttKBcofahlfatSO^UEiZiH 



6/45 



WO 2005/028666 



PCTYUS2004/029397 



c 



o 
o 
w 

o 

■ MH 

o 

CD 
Q. 
0) 



TO 
0 



.2 O 

go. 



o 

c 

o 

"c/> 
c 
© 

LU 



E 



<0 
+ 



in 
+ 



+ 



"9 



CD 



d 



d 



CO 

d 



d 



m 
9 



CO 

d 



CO 

d 
i 



o 
d 



CM 



O 
i 



CM 

d 

I 



CM 

d 



o 

d 



CM 

d 



d 9 



G) 
3 



UJ 



I w 



CO 

d 



CO 



d 



CO 

d 



d 



d 



o 
d 



CM 

d 



i 



o 
i 



CO 

d 
i 



CM 

d 

I 



CM 

d 



o 
d 



O) 



do ^ 



o < 



£■8 

>» c 

II 



peuiuuejep 
A||Bjueuiuedx3 



Ll 



7/45 



WO 2005/028666 



PCT/US2004/029397 




O 

8/45 



WO 2005/028666 



PCT/US2004/029397 



0 

'(/> 3 

c — 

c5 u 
o 

CO .P 



0 



CO 



T Z3 

O CO 

§ S 

■43 CD 

0 V 

O i_ 

£ c 

00 o 

£ o 

o 0 




uo||e|Ajoi|dsoi|d pajtisBsui 



9/45 



WO 2005/028666 



PCT/US2004/029397 



m 

c 

03 
O 

CO 



o "? 

o o 

3 0- 

° o 

ci 

o ~ 

.52 o 

E 
o 

o 



I 



■ MM 

Li. 



o 

<D O 



,0 



£ .2 

b | 

8 & 

o g 



GO 00 

O O 
Ph Ph 



a 

CZ5 



T3 

TO 
CD 



Cfl 
TO 



CD 

a* 



a* 



o 



CO 



CO 



CSJ 



CO 



"to 

ax 



■— o 



IT5 



u 

O 
CO 

"to 



5£ 
5 

CO 



O 
CO 



U 

o 



3r 


cu 












■ ■■■■ 

"is 


TO 




CO 






























cx> 


CO 


CU 










easur 


tive 












■» 




CM 










Po 
















CU 


CU 












tiv 












> 

■ MM 












osit 


TO 












OX 


£ 














o 


mmbbvj 


■ MM 

o 










CO 


• ma 














■ ■■■1 

CJ 






ictio 


£ 
O 


otal 


Sen 


CD 
CO 








o 










□J 
u 

























10/45 



WO 2005/028666 



PCT/US2004/029397 




WO 2005/028666 



PCTAJS2004/029397 




12/45 



WO 2005/028666 



PCT7US2004/029397 




O) 

■ MM 

LL 



13/45 



WO 2005/028666 PCT/US2004/029397 



8 g 

.Q a- 

CD 0) 

O TO 

(0 O) 



§ 

0) 
Q. 

E? 

(0 



a) a g> 



c 

"D .2 
a) 



o 

t a o 

4J L I!" 

« o *= 
IJJ o " 



0) 



aj 



-S ^ 

CO 



8 c a) 

= 5 E 
«J E o 



a) 
o 

■= c 
UJ § 



o 

o c 
JC o 
CL O 
O CO 

I 



o 
E 

z 



Q) 
O 

o 



LO<ocoa>c\i^oooicMcoco<oiriTrc^ 



COOCDCD'^OOxr^-COCOCN'sfCDCDCOOOCD^CMx)- 



LO 



o 



LO 



CM 



CM 



LU 



LO 



c 

c 

eg 

eg 

>* 

c 1 

0) 

-C 
a 



LL 



CD 



LO 



LO 



00 



LO 



LO 



LO 



LO 



LO 



CM 



LO 



a 



LO 



LO 



CO 



LO 



LO 



CO 



LO 



•O 



LOjCO 
CM.O> 

cmo 



5h 



Q.Q 



14/45 



WO 2005/028666 



PCT/US2004/029397 



>> 

_ "D 

T CD 

X 0 

co E 

^ 3 a. 

c co g 

.2 .</> p 



CO _ 

o — 



0 
o 



ft- >s 

O O 
CO 

° O 

CO 0- 



0 



c 
o 

o 
0 

0 
Q 

■ ■ 
T- 

(!) 



C 

o 

■ M 

•I— » 

JO 

E 

-t— » 
CO 

D) 

c 

£ 1 
0 

CO 

0 



o > 



in 



I 



00 




J2 

1 

(in 




<3\ 
o • 

33 u 



15/45 



WO 2005/028666 



PCT/US2004/029397 



(D <D 

w c 

(D 

'•g 1 

CO C/) 

(0 3 
CD s - 

i2 i; cd 

$ £ § 

i! Q.T3 
CO CO 
C ^3 
CD X 
Q) CD 

E o 

O C/5 

-o c 

CD CO 

> CD 
i— — 

CD -9 

"D *0 

CO => 
S O 

8 8- 

co 2 




Z J9S uioi| aioos 



CO 
Li- 



ft 



0) 



LU 



CO 



CO 



<0 



paimujejep 
AiiFjuaiuuadxg 



Z3 

■o 2 

Eg 

CD -O 
<0 £ 
tO § 

Si 

2 i2 

TJ CO 
g, ^ 

JO d) 
iS W m 

JS || 

3 §| 
o £ 

8 2 J 

Wt O 
CO © 

o 32 cd 

* 3 CO 



16/45 



WO 2005/028666 



PCT/US2004/029397 



0 

I HUM 

Q. 

0 

Q. 

CD 



O 
O 

(f) 

u m 

00 



LL 



ii 


0.02 








Total Raw 
Score 




+ 




lO 

a 


to 
+ 


LL 


6 
■ 


? 


CO 


CO 
O 


CO 

+ 




(O 

6 






ci 


+ 


LL 


a> 
d 


2 


<f> 


o 
6 


I 


LL 


d 
i 




a: 


Oil 


! 




^1 


t 




cqj 
d 


10 

I 




d 


<° 




d 






d 








Resldi 
Numb* 


0) 
10 


Scored 


1-FKKSFK 


ince 


1± 
a 


! 


Seque 






Protein 


MARCKS 



17/45 



WO 2005/028666 



PCT7US2004/029397 




WO 2005/028666 



PCT/US2004/029397 



£ CO 

TO £ 
O 2 

"O 

CL ±i 



CD 
"(75 

o 

CL 

0 



O 

■ mm^m 

CD 



CM 



o 

o 
o 



3 £ 

If 
Si 



O 



o o o o o 



opoo>|ooooo 



O 



^2oi5|oOCN!§ P 



J? 



s 



co Tf u> m n co o) o r 




CO 



19/45 



WO 2005/028666 



PCT/US2004/029397 



"D 
C 
03 

15 
> 
o 



CO 

0 
o 
E 

c it 

0 v3 

1 u 

JD Q- 

^ "co 



CO 



O -S2 ^ 
0 CO $ 

— - > 

>> CO 

E 

-2 o 
E o 

■7= CO 
CO — 

-c O 

D) ^ 



Csl 

o 

LL 



CO 

o 
'co 

CO 
JO 

o 



0 

0 

CO 

0 
o 

0 

O) 

0 
> 

CO 



u 

I MM 

0 

E 



L. 

o 

o 



.5£ 



I- 



N 



EZ 

Q 

En 



03 



CO 
D 



o 
o 



CD 

CD. 



-I— i 

CD 
■D 



a 



■ 

Q 



Cl 



N 



a 
a 



a 

Q. 



CM 



20/45 



WO 2005/028666 



PCT/US2004/029397 



CD 

I 



CO 

c 
o 

o 

o "o 
o 

CO ^ 
O CD 

o -5 



C/) 

w 

CL 

c 
0 
0 



0 

Q. 
0 

£ 

03 
CO 

0 



0 
CO 

0 
o 

I 

■ MM 

Q 

c\i 

CM 
D) 
LL 



CD 

O 

o +- 

o 1 
~J £ 

co 

CO < 
CL ^ 



CO 



o £ 

D) O 
Z_ CO + 
CO N 

CL 










0 




5 


N 




I- LUO 


>> 




CO 


o 

O) 


c 


o 


CO 
CO + 


CO 


—I 


ses 


PSSM 


C-thet 


CO 






c 








21/45 



WO 2005/028666 



PCT/US2004/029397 



I 

o 

Q_ 

to 
o 

TJ CD 



CD ±i 

°- -a 

iS 6 

CD V 

si 





r- 




♦ ' 




♦!•; 





i 

u 

I 

N 

CO C 

o 

i 

Xf 



8 S S ° 

: uojjejAioijdsoqd pajnsea w 




8 8 8 § 

B))9p U0piAi0L|dS0L|d p9JHS139|AJ 




I 

N 

c 
o 

1 

o 

a. 
u> 
o 



2 
8 

<D 

5 



CO 
CM 

O) 
LL 



8 8 5 8° 

uonB|AjoL|dsoi|d pajnsEaw 



eiisp uoDB|Ajot|dsoi|d peunsesyj 



22/45 



WO 2005/028666 



PCT/US2004/029397 



CO 

<D 

"O 
i 

o 

CL 

c 

CO 
Q- 

CD JB 

■E "? 
</>0 

CD ^ 
D)Q_ 

^ -Q 
O "D 

c co 
o 



o 



V 


« 9 ^ 


«3|33 






*3S3 




V 
T 

CM 
+ 

+ 

0 


^ v S 

Jtf ^ W 

^ ^ o 

*322 

0 0 WJ 

»325 


*232 

sf a 

^ ^ 9 

- Wffi8 
^ v* Q 

i-^ «■» 
O w d 

"325 


= 332 

<2S2 

a ^ ^ 

z = 22 

a a 

«? ® 

_ ^ 
a o a 

v) 5 «n *: 


9 ©« 
^ a 

«- CM P» 
G9 <r- 

O 8 S 

"325 


*323 

^ O 

^ ^ o 

^ o 

"323 


w 5| v] 
tt 9 *-o 

to m cn 
9 ^ 

o a a 

"325 








M 999 


*333 


"323 


<323 


4-3-2 


O »■ »■ 

*»33 


*«22 




*a*99 
sea 


*332 

u ^ r «m 


CM CM 

a ° 9* 9 
^ cn ^ cm 








3 


«2i32 


*32S 


-332 


9 




-*:9«i 


<333 


»2|35 


*2|33 


«S3S 




a 9 s 


*333 


«523 


<232 




<533 


if 

c 

a 

il 

*f 
11 

a o 


T m o 


*fjq in 
T <N s 


Two 

g « S2 




<Vcno 


T CM o 


JS 
A 


w 

g » n 

in ® » 

i N ^ 


u 

b ® CO 

I"* 


§ 5 J! 

>- 01 © 

INM 

? 


§ Si 
j. a a 
| H , 


o 

1 S 

fa. 0) & 


s 

2 il 


Peptide 




M 


(0 




ID 


to 



23/45 



WO 2005/028666 



PCTAJS2004/029397 



CL 

O 

CD 
O 
C 
0 

£ 

0 
CL 
0 



0 
(0 

0 
CL 

D 

C/) 

O 

CL 
0 

CD 



0 ^ 

O "D 

5E 0 

o c 
0 "F 
cl E 

V 0 

.2 "o 

-«— » 

"55 O 

o * 

CL CL 



CN 

CD 



c 

CO 



CL 



CM 



<9 



CO 



ch CM 00 
t • t 
OHO 



cn oo c\| r- <Tk 
• • ■ ■ • 
o o o o o 



i^^cn^^cvjirtioHjco|wr--iri 
clHoooHHowl^HdH 



CO h 

o o o 



^1 vol oo m 
o o o 



o 



3 



m r> h 

H H O H 



3cs!0^oJmooir)CMCSjf^ro 
Hi-JnIhoohcvJoo 



00[ C\] 

o 



u> in cr> 
o o o 



H 00 O 
H H H 



1 



VO O Vfl 

©ho 



O Ol CTi 
• • • 
H O O 



in ^ in ^ 
o o o o o 



cr> m \o 
o o o o o 



Tj l£> 

OOP 



Hi CH| CO 10 

q H O 



H H 10 in o 
H <H O H H 



CT) CN 00 
OHO 



3 



h o 



00 00 vo 

o o o 



oj H o) o id 

O HJ H] H H r-j 
O O H] H Hj H] 



o r- o 
• • • 

H O H 



O O Ci 
H H O 



(Tl ^ O 
O H H 



OHO(OfljCO|(VJO^^ 

hhhohIhjhoo 



0> 

o 



HmcTiooroovoocNj 

HOOOHHOHH 



w h « 

O O o 

O 00 00 

H O O 
4c 

tp r* r- 

O O O 



h] 3 cj 



cj\ m co oo cf\ vo 
o o o o o o o 



HOHHOOOOO 

fOHQOntMHhO^CO 
HHOHHHOOO 



QZO(t^I(0lJU.^C?<>- 



to 

CD 
T3 



s 

§ 

w 

>» 
I 

IS 



co 

CD 

CO 
> 

If 



TZ 
CD 

TJ 
C 

o 

£ 
to 

0) 
CD 
3 

'{/) 
CD 



o 

> 

•S 

CO 

CD 
.2 

cf 
to 

CD 

E 



CO CD 

TO C 

CD CD 

CO T3 

W C 

E 2 



X 
CD 
to g 

i| 

r .a 



24/45 



WO 2005/028666 



PCT/US2004/029397 



CO 

to 
o 



CO 

c 

O CO 

O CO 

^ CO 

C CO 

£ I 

CO C- oo 

c/) "2 • 

"O o 

£ o 

CL 

CL i_ 

u_ 0 





0 

CO 



CO 
CM 

» Mi 

LL 



CO 




25/45 



WO 2005/028666 



PCTAJS2004/029397 




26/45 



WO 2005/028666 



PCT/US2004/029397 



c 
o 

nmm 

CO 

1 H 

9 CL 
0 

CL L - 

o oc 

C O Q. 

(1) -C ^ 

jj> CO Q. 
0 O 

0 Q- 

3 "D 
0 l- 

o o.2 
= cog 

Oj - o 
0 O 



<2 Z^nB3t3 



fry 



JSPP6P 1 S 
JSPUcP I o 

JSPHcH q < 



dSd,PH _ 

Q < 




6SPy 
6SPJJ 




6SPPU 



© o 

0 O 



C/5 

0) 
0 

£ >* 
C CO 

>» # 

CO co 

"CD *g 
Q. 

Q 



0 

U6SPHU I | 

s 2 



0 o 

o> o 
5 < 



27/45 



WO 2005/028666 



PCT/US2004/029397 




28/45 



WO 2005/028666 



PCT/US2004/029397 




29/45 



WO 2005/028666 



PCT/US2004/029397 



Ql 

,0 

co 



CO 
0 

0 

co 

CL 

I 

a: 



CO 
LL 



a 
> 


OCMC0x~10|(J>|CMC010t- 

0000^1610*00^ 

1 1 1 1 


T— 
O 


CO 

+ 


^ N 0)0 N ir lO 0) 00 

* -* " ml ■ ■ • 

V 9 9 7 0|0 rrN 


-1.1 | 


CM 
+ 


0709^0^0 <N 


-0.5 


+ 


«> t- W U> co| (Nj {©I CM 0) 

9 0 9 9 ol 9 ol 9 v 


-0.3 


3f 2nd R 
-2 -1 


» "> t «> Ml Ol Wit W 

9 9 9 9 T-l H ol "7 v 
ol ol o| ol col T-| 9 0 9 


0.9 -0.2 


Position < 
-4 -3 


f 1 T"| T-| T-| COl T-| Ol T-| ©| 

<*? ^- Ol 0>| coi 10 o> 0 
9 0 A t-:| ol<9 0 or 


mi 

• 

O 


m 
i 


O T-l T-| 1-| Ol 9 9 0 9 


CO 

0 


CD 
i 


T-| r-l 0 T-| ol 9 0 v 9 


CM 
O 


i 


*OI CO CO ,-1 001 co » t ^ 

Ho 9 H0I9 997 


O 
O 




h r9 l ?t t ? c j , 7 + + + 
y ;s|, jo uonisod 


I avg 



£ 

CD 

CO 

a> 

"D 



T3 

1 

co 

_>» 

c 
iS 

CO 
J3 
=3 
CO 
i— 

a 



8 

CO 
O) 

o 



0> 

c 

T= 
0) 

"O 
C 

o 

-O 

CO 
CO 
<D 
3 

■g 
'co 

s 

.CD 



CO 

c 

J2 0 



23 
8| 

CO g 



30/45 



WO 2005/028666 



PCT7US2004/029397 



0 

CO 

■ Ml 

o 
o 

C/) 

•2 CD 

~ o 

CO c 

</) £ 

TO 2 

P 

■ MB 

CO 0 

■ ■ 

CNI 
CO 

ej 

LL 



a. 













BEL i w 












S-i + 
















CO 










■ n% 

1 [j* 


<? 


r 1 1 1 


i "i - 



q to o 

CM T" v 



u> q 
d d 



in q in 

C> V* V" 
I t I 



55 
< 



1 — EES 

MSI 



' — 1331 



CO 

+ 



+ 



1 T 



m — it 



in ^ co eg © r w.« ^ 
ooooodoo d d 



'■'■<yy - i '1 CO 



l-h*v=: 



HHK2E 



«? 
«? 
t 

in 
■ 

CD 



qinqinqinqinq 

MT-r-OOdnrT^Cji 



CO 

a 

■ 

a. 




™i — i — i — i — r- 

ooocq^cjqcvi^tcqoq 
t-ooooddddd 



i i i 



31/45 



WO 2005/028666 



PCT/US2004/029397 




-Q "D 

CM £ 



C 

o 



CD 

E 

CD Q_ 

CO <1> 
O 

_C CD 



JS = 

f 

o * -2 
© CL CO 

8" °- "3 

O £ -J3 

o a n 
0 -§ =2 



CO 
CO 

CD 



0 
h— » 

CO 
CD 




32/45 



WO 2005/028666 



PCT/US2004/029397 




33/45 




34/45 



WO 2005/028666 



PCT/US2004/029397 



T3 

■o 

1L 

CO 

■o 
■o 

■o 

_ o 

CO <N N 

co ' tSL 



o + 



CO 
CO 

LL 



"D 
"O 
LL 
<0 

■o 

a: 

T3 

■o 

SN 



Pi 

CO 
I 



D) 0) O) 
> OOCM 



eg t- co 

o CO CO CM 



c 

o 

w 
o 

QJ 



CM 



CO 



co io w 

h- CD 10 

■ • » 

9 T T 



CO ^ 
O CD 



CO CM CO O 
O ^ CO CD 



O 
CO 



ooo 



o 

I 



CO 



eg ^ cm 

CO O U) CO 



ooo 



o o in a) co co m 
^ cm co o m m 

o d o o o o o 



00 C0| 001 CDI CO 
CM CO CD I 00 1 CO 



CO O) CM IOI 
O CM H 

odd 



CO CO 

o 



* 

O CD 
O CD 

d d 

■ 



00 O) 

CO 00 K 

999 



00 

o o 

d d 
i 



O CD 
CM h- 

d O 



001 



o 

I 



CO 



o H 



CM 

d 



in 



U) I CO CO 
Nil U) CD 




IO| C0| O) 

o|o 




dl 0900 dl 



CO co co 

T- 

odd 
1 1 



K CO IO CO CM 



o 

CL 



uopisod j 



o 

d 
1 



CO 



0 



co 



00 

o 



o 
o 



o 

I— 

(0 



3 
T3 



0 
i_ 

o 

> 



■o 
>» 

TO 

'♦3 

c 

TO 

CO 



o 
o 

co 

o 

CD 

c 

(1) 

C 

2 
o 

_o 

S> 

CO 
CO 
CD 

"D 

"to 

2 

3 
§ 



to © 
to c 

to 



CO 

a: 



to ti 

2 8 

O -C 
O 

CO £ 

°? 

^5 



35/45 



WO 2005/028666 



PCTYUS2004/029397 




36/45 



WO 2005/028666 



PCT/US2004/029397 



UL 




a: 




"D 




CD 








& 




<D 


CD 


Q_ 


he 


>% 


i 


0)0 


C 




g 


Q_ 


CO 


,o 


CD 




c 


CD 


o 


"O 


c 


■ am 


CO 


pei 


CD 






"co 


O 


CL 






00 




CO 




CD 





o 
o 
(0 



o 

"" c 
o (0 



in 
csi 



o 



ft 



CD CO N- CO 



(jo m m o 

■ ■ ■ ■ 

CO 00 CO CO 




CO CM t- 



csi c\i csi 










* CO ^ 












■ 5 




" CO 




■ csi 


r | 


— *=n 5 r 



LO 



in 
d 



q 
d 



in 
9 



in 



37/45 



WO 2005/028666 



PCT/US2004/029397 



> 
CO 



CO 

+ 



CN 
+ 



o 



CM 
i 



CO 
I 



a: 

C 
CM 



C 

g 
o 



I 



CD 



i 



or-oco^r 
o o o o o 



it- h- 
Idoo o 



U) 00 CD 
> • • 

999 



CO |sj ^" 0> 0> 

9 ol 9 o o 



h- o 



^ S ^" 



m oo s 

■ ■ ■ 

o o o 

I I I 



™ CM 

T d 
to 



o 

I 



o 

I 



CO CM o 
99909 



CD 
■ 

O 

1 



0> 



CM 

69 



CO CM CM t- 
Oddci 



1 a 



CO M 



T-l I Tt! 



O 
6 



^ LT) SIN 

9 d d| 6| 



Tf 1 0>| ^ cq CO 



000 



^ <PI 
d 0 dl 



T o 
9 0 



I 5151 



dlrldOOO 



<°| hi *l 
01 oh 



CM X 
d9 



0> 

O 

I 



O m ^ 
O O O T~ 



CM 



SOW 



69 



o 

I 



SI 



I ^ «N t-i 

19 0°r| 



Id 



* ■ 

9 



o 
■ 



S CO IO t CO CM 



t- CM CO 
+ + + 



CO 



© c 

£ c g 
a> £ 5 

O 3 _£Z 
> W ±f 
«3 u > 

o > 
.55 © E 

CD O C 
- (0 CO 

-§ 8*8 
w — ^ 

1? 

co § § 
o>-a £ 
5 o .S2 



8 

















O 
CL 




1 




CM 
1 




CO 
1 




t 






in 
1 




► 


CO 
1 


< 

1 1 


> 





10 
0 



o 
d 



10 
9 



o 

r- 
1 



ojoos 3601 



38/45 



WO 2005/028666 



PCT/US2004/029397 



00 CD 
■ ■ 

9? 



O) r O) O ^ W CO 



O T- O V T" 

I I I I I 



CO 
+ 



. a ■ ■ ■ 
O T- T- T" T- 



CM 
+ 



O T- T- T" 
I I I I 



T- O 

I I 



CM 
■ ■ 

I I 



8 

a 



o 



00 O CO CO CO CO 

d d d d d d 



* 

CM 



CM CM 



o o o o o o 



CO 
I 



CM O CM r-UJr 

odd odd 

IX) CO r O r* l/) 

do d d d d 



in 
i 



CM ^* 

d d 
i i 



CO to 

d d 
i i 



CM 



d 

00 CO 

d d 



co cm 

odd 
i i 



olo ol 



lOOlO 

odd 



CM CM ^ 

odd 
i 



CD ID CO CM 
i i i I i i 



P v- CM CO 

£L + + + 



uorjisod j 



CO 

to ® 
a> ^ .£ 
-5 

S is aj 

$|| 

§■§1 

1 si 

s § & 

JD D) CO 
3 O CD 

551 

S c o 

» 3 > 
g) 2 £ 

^1 



co CO 

H 





7 V 




■ i 


♦ I 1 ? 



c 

o 

'55 



io o to o 
6 d d t-" 



aioos zBo-j 



39/45 



WO 2005/028666 



PCT/US2004/029397 




40/45 



WO 2005/028666 



PCT/US2004/029397 



0 c 

"2 ° 

"</) 

0 < 

o 

T cc 

0- "S 

X 1. 

CO "V 

0 o 



(0 



Ml 

■ E ^ -S 

"5 0 °- 

.£ > j3 

-Q "4= 

■o * "5 

O 0 ~ 

-Q > 



C 3 

O g 

* o 
Q. o 

$ S 

LL CO 



E 
o 

Q- 



3 

1 

Ph 




3 

Ph 



< 

Ok 
00 



CO 

© 



+ 



+ 



2 

Ph 



41/45 



WO 2005/028666 



PCT/US2004/029397 




WO 2005/028666 



PCT/US2004/029397 




43/45 



WO 2005/028666 PCT/US2004/029397 



C/) 




44/45 



WO 2005/028666 



PCT/US2004/029397 




45/45 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 



□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




